Skip to content

Decouple litd and lnd Shutdown Interceptors #1201

@ffranr

Description

@ffranr

Summary

Currently, litd and lnd share a single signal.Interceptor instance. This tight coupling means that any shutdown of lnd (whether intentional or due to errors) cascades to shut down litd entirely. This prevents litd from maintaining its status endpoint after lnd stops, reducing observability during failure scenarios.

This issue proposes decoupling the shutdown interceptors so that litd can continue running (with its status endpoint available) even after lnd has stopped.

Background

Current Architecture

litd creates a single signal.Interceptor and passes it to lnd:

// terminal.go:269-270
shutdownInterceptor, err := signal.Intercept()

// terminal.go:579
err := lnd.Main(g.cfg.Lnd, lisCfg, implCfg, interceptor)

The interceptor is also stored as a package-level variable for use by the logger and RPC handlers:

// log.go:33-35
var interceptor signal.Interceptor

The Problem

When lnd shuts down (via StopDaemon RPC or any other means), it calls interceptor.RequestShutdown() on the shared interceptor. This closes the ShutdownChannel() that litd is waiting on:

// terminal.go:435-437
<-shutdownInterceptor.ShutdownChannel()
log.Infof("Shutdown signal received")

Result: litd exits, and its status endpoint becomes unavailable.

Why This Matters

In PR #1183, we added fail-fast behavior for critical sub-server startup failures (e.g., tapd). When tapd fails to start, we call StopDaemon on lnd to shut it down promptly. However, due to the shared interceptor, this also shuts down litd entirely.

Ideally, litd's status endpoint should remain available after such failures so operators can:

  • Query the status endpoint to understand what failed
  • Retrieve error details programmatically
  • Monitor the system state via external tools

Proposed Solution

High-Level Approach

  1. Create a separate interceptor for lnd that doesn't cascade to litd's shutdown
  2. Have litd monitor lnd's lifecycle independently via the existing lndQuit channel
  3. Keep litd running after lnd stops, with the status endpoint reporting the failure state
  4. Only shut down litd when:
    • An OS signal is received (SIGTERM/SIGINT)
    • The user explicitly calls litd's StopDaemon RPC
    • A fatal litd-specific error occurs

Implementation Steps

Step 1: Create a Custom Interceptor Wrapper

Create a wrapper that implements the same interface as signal.Interceptor but allows litd to control whether lnd's shutdown propagates:

// interceptor.go (new file)
package terminal

import (
    "github.com/lightningnetwork/lnd/signal"
)

// LndInterceptor wraps a signal.Interceptor for lnd, allowing litd to
// monitor lnd shutdown without being forced to shut down itself.
type LndInterceptor struct {
    signal.Interceptor

    // lndShutdownChan is closed when lnd requests shutdown.
    lndShutdownChan chan struct{}

    // litdInterceptor is the main litd interceptor, only used for
    // OS signal propagation.
    litdInterceptor signal.Interceptor
}

// NewLndInterceptor creates a new interceptor for lnd that doesn't
// cascade shutdown to litd.
func NewLndInterceptor(litdInterceptor signal.Interceptor) *LndInterceptor {
    return &LndInterceptor{
        Interceptor:     litdInterceptor, // Inherit OS signal handling
        lndShutdownChan: make(chan struct{}),
        litdInterceptor: litdInterceptor,
    }
}

// RequestShutdown is called by lnd when it wants to shut down.
// We close our local channel but don't propagate to litd.
func (i *LndInterceptor) RequestShutdown() {
    select {
    case <-i.lndShutdownChan:
        // Already closed
    default:
        close(i.lndShutdownChan)
    }
}

// ShutdownChannel returns a channel that's closed when lnd should shut down.
// This is triggered by OS signals OR lnd's internal shutdown request.
func (i *LndInterceptor) ShutdownChannel() <-chan struct{} {
    // Merge OS signal channel with lnd-specific shutdown channel
    merged := make(chan struct{})
    go func() {
        select {
        case <-i.litdInterceptor.ShutdownChannel():
            close(merged)
        case <-i.lndShutdownChan:
            close(merged)
        }
    }()
    return merged
}

// LndShutdownChannel returns a channel that's closed only when lnd
// requests shutdown (not on OS signals).
func (i *LndInterceptor) LndShutdownChannel() <-chan struct{} {
    return i.lndShutdownChan
}

Step 2: Modify terminal.go to Use Separate Interceptors

// terminal.go - Run() method

// Create litd's main interceptor for OS signals
shutdownInterceptor, err := signal.Intercept()
if err != nil {
    return fmt.Errorf("could not intercept signals: %v", err)
}

// Create a separate interceptor for lnd
lndInterceptor := NewLndInterceptor(shutdownInterceptor)

// ... later, when starting lnd ...

err := lnd.Main(g.cfg.Lnd, lisCfg, implCfg, lndInterceptor)

Step 3: Update the Main Event Loop

Modify the shutdown handling to distinguish between lnd shutdown and litd shutdown:

// terminal.go - after start() returns

select {
case <-shutdownInterceptor.ShutdownChannel():
    // OS signal or litd StopDaemon - shut down everything
    log.Infof("Shutdown signal received, stopping all services")

case <-lndInterceptor.LndShutdownChannel():
    // lnd stopped but litd should keep running
    log.Infof("LND has stopped, litd status endpoint remains available")

    // Keep running until explicit shutdown
    <-shutdownInterceptor.ShutdownChannel()
    log.Infof("Shutdown signal received")
}

err = g.shutdownSubServers()

Step 4: Update the Package-Level Interceptor in log.go

The package-level interceptor variable in log.go is used for critical error shutdown. This should continue to use litd's interceptor:

// log.go - no changes needed to the variable itself, but ensure
// SetupLoggers receives litd's interceptor, not lnd's

Step 5: Update rpc_proxy.go StopDaemon

The litd StopDaemon RPC should continue to shut down everything:

// rpc_proxy.go - no changes needed
// interceptor.RequestShutdown() still uses litd's interceptor

Step 6: Handle lnd Shutdown in Status Manager

When lnd stops, update the status manager to reflect this:

case <-lndQuit:
    g.statusMgr.SetErrored(
        subservers.LND, "lnd has stopped",
    )
    // Don't return error - keep litd running

Files to Modify

File Changes
terminal.go Create separate lnd interceptor, update event loop
interceptor.go New file - LndInterceptor wrapper
log.go Ensure it uses litd's interceptor (may need verification)
rpc_proxy.go Verify it uses litd's interceptor
config.go Update loadAndValidateConfig to accept litd interceptor

Testing

  1. Unit tests: Test the LndInterceptor wrapper in isolation
  2. Integration tests:
    • Start litd in integrated mode
    • Cause lnd to shut down (e.g., via StopDaemon)
    • Verify litd's status endpoint is still accessible
    • Verify status endpoint reports lnd as stopped/errored
    • Verify litd shuts down cleanly on SIGTERM

Considerations

OS Signal Handling

The signal.Intercept() function registers handlers for SIGINT/SIGTERM. We need to ensure:

  • Only one interceptor registers OS signal handlers (litd's)
  • lnd's interceptor inherits OS signal notifications from litd's

Graceful Degradation

When lnd stops but litd continues:

  • Sub-servers that depend on lnd will fail
  • Status endpoint should clearly indicate the degraded state
  • Consider adding a "degraded" status in addition to "running"/"stopped"/"errored"

Remote Mode

This change primarily affects integrated mode. Remote mode already has lnd running separately, so the interceptor coupling doesn't apply.

Backward Compatibility

  • External behavior change: litd will no longer exit when lnd stops in integrated mode
  • Operators who rely on litd exiting when lnd stops may need to update their monitoring
  • Consider a config flag to preserve the old behavior if needed

Acceptance Criteria

  • litd continues running after lnd stops in integrated mode
  • Status endpoint remains accessible and reports lnd's stopped state
  • OS signals (SIGTERM/SIGINT) still shut down both lnd and litd
  • litd's StopDaemon RPC still shuts down both lnd and litd
  • No regression in remote mode behavior
  • Integration tests cover the new behavior
  • Documentation updated to reflect the change

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions