-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Summary
Currently, litd and lnd share a single signal.Interceptor instance. This tight coupling means that any shutdown of lnd (whether intentional or due to errors) cascades to shut down litd entirely. This prevents litd from maintaining its status endpoint after lnd stops, reducing observability during failure scenarios.
This issue proposes decoupling the shutdown interceptors so that litd can continue running (with its status endpoint available) even after lnd has stopped.
Background
Current Architecture
litd creates a single signal.Interceptor and passes it to lnd:
// terminal.go:269-270
shutdownInterceptor, err := signal.Intercept()
// terminal.go:579
err := lnd.Main(g.cfg.Lnd, lisCfg, implCfg, interceptor)The interceptor is also stored as a package-level variable for use by the logger and RPC handlers:
// log.go:33-35
var interceptor signal.InterceptorThe Problem
When lnd shuts down (via StopDaemon RPC or any other means), it calls interceptor.RequestShutdown() on the shared interceptor. This closes the ShutdownChannel() that litd is waiting on:
// terminal.go:435-437
<-shutdownInterceptor.ShutdownChannel()
log.Infof("Shutdown signal received")Result: litd exits, and its status endpoint becomes unavailable.
Why This Matters
In PR #1183, we added fail-fast behavior for critical sub-server startup failures (e.g., tapd). When tapd fails to start, we call StopDaemon on lnd to shut it down promptly. However, due to the shared interceptor, this also shuts down litd entirely.
Ideally, litd's status endpoint should remain available after such failures so operators can:
- Query the status endpoint to understand what failed
- Retrieve error details programmatically
- Monitor the system state via external tools
Proposed Solution
High-Level Approach
- Create a separate interceptor for lnd that doesn't cascade to litd's shutdown
- Have litd monitor lnd's lifecycle independently via the existing
lndQuitchannel - Keep litd running after lnd stops, with the status endpoint reporting the failure state
- Only shut down litd when:
- An OS signal is received (SIGTERM/SIGINT)
- The user explicitly calls litd's
StopDaemonRPC - A fatal litd-specific error occurs
Implementation Steps
Step 1: Create a Custom Interceptor Wrapper
Create a wrapper that implements the same interface as signal.Interceptor but allows litd to control whether lnd's shutdown propagates:
// interceptor.go (new file)
package terminal
import (
"github.com/lightningnetwork/lnd/signal"
)
// LndInterceptor wraps a signal.Interceptor for lnd, allowing litd to
// monitor lnd shutdown without being forced to shut down itself.
type LndInterceptor struct {
signal.Interceptor
// lndShutdownChan is closed when lnd requests shutdown.
lndShutdownChan chan struct{}
// litdInterceptor is the main litd interceptor, only used for
// OS signal propagation.
litdInterceptor signal.Interceptor
}
// NewLndInterceptor creates a new interceptor for lnd that doesn't
// cascade shutdown to litd.
func NewLndInterceptor(litdInterceptor signal.Interceptor) *LndInterceptor {
return &LndInterceptor{
Interceptor: litdInterceptor, // Inherit OS signal handling
lndShutdownChan: make(chan struct{}),
litdInterceptor: litdInterceptor,
}
}
// RequestShutdown is called by lnd when it wants to shut down.
// We close our local channel but don't propagate to litd.
func (i *LndInterceptor) RequestShutdown() {
select {
case <-i.lndShutdownChan:
// Already closed
default:
close(i.lndShutdownChan)
}
}
// ShutdownChannel returns a channel that's closed when lnd should shut down.
// This is triggered by OS signals OR lnd's internal shutdown request.
func (i *LndInterceptor) ShutdownChannel() <-chan struct{} {
// Merge OS signal channel with lnd-specific shutdown channel
merged := make(chan struct{})
go func() {
select {
case <-i.litdInterceptor.ShutdownChannel():
close(merged)
case <-i.lndShutdownChan:
close(merged)
}
}()
return merged
}
// LndShutdownChannel returns a channel that's closed only when lnd
// requests shutdown (not on OS signals).
func (i *LndInterceptor) LndShutdownChannel() <-chan struct{} {
return i.lndShutdownChan
}Step 2: Modify terminal.go to Use Separate Interceptors
// terminal.go - Run() method
// Create litd's main interceptor for OS signals
shutdownInterceptor, err := signal.Intercept()
if err != nil {
return fmt.Errorf("could not intercept signals: %v", err)
}
// Create a separate interceptor for lnd
lndInterceptor := NewLndInterceptor(shutdownInterceptor)
// ... later, when starting lnd ...
err := lnd.Main(g.cfg.Lnd, lisCfg, implCfg, lndInterceptor)Step 3: Update the Main Event Loop
Modify the shutdown handling to distinguish between lnd shutdown and litd shutdown:
// terminal.go - after start() returns
select {
case <-shutdownInterceptor.ShutdownChannel():
// OS signal or litd StopDaemon - shut down everything
log.Infof("Shutdown signal received, stopping all services")
case <-lndInterceptor.LndShutdownChannel():
// lnd stopped but litd should keep running
log.Infof("LND has stopped, litd status endpoint remains available")
// Keep running until explicit shutdown
<-shutdownInterceptor.ShutdownChannel()
log.Infof("Shutdown signal received")
}
err = g.shutdownSubServers()Step 4: Update the Package-Level Interceptor in log.go
The package-level interceptor variable in log.go is used for critical error shutdown. This should continue to use litd's interceptor:
// log.go - no changes needed to the variable itself, but ensure
// SetupLoggers receives litd's interceptor, not lnd'sStep 5: Update rpc_proxy.go StopDaemon
The litd StopDaemon RPC should continue to shut down everything:
// rpc_proxy.go - no changes needed
// interceptor.RequestShutdown() still uses litd's interceptorStep 6: Handle lnd Shutdown in Status Manager
When lnd stops, update the status manager to reflect this:
case <-lndQuit:
g.statusMgr.SetErrored(
subservers.LND, "lnd has stopped",
)
// Don't return error - keep litd runningFiles to Modify
| File | Changes |
|---|---|
terminal.go |
Create separate lnd interceptor, update event loop |
interceptor.go |
New file - LndInterceptor wrapper |
log.go |
Ensure it uses litd's interceptor (may need verification) |
rpc_proxy.go |
Verify it uses litd's interceptor |
config.go |
Update loadAndValidateConfig to accept litd interceptor |
Testing
- Unit tests: Test the
LndInterceptorwrapper in isolation - Integration tests:
- Start litd in integrated mode
- Cause lnd to shut down (e.g., via
StopDaemon) - Verify litd's status endpoint is still accessible
- Verify status endpoint reports lnd as stopped/errored
- Verify litd shuts down cleanly on SIGTERM
Considerations
OS Signal Handling
The signal.Intercept() function registers handlers for SIGINT/SIGTERM. We need to ensure:
- Only one interceptor registers OS signal handlers (litd's)
- lnd's interceptor inherits OS signal notifications from litd's
Graceful Degradation
When lnd stops but litd continues:
- Sub-servers that depend on lnd will fail
- Status endpoint should clearly indicate the degraded state
- Consider adding a "degraded" status in addition to "running"/"stopped"/"errored"
Remote Mode
This change primarily affects integrated mode. Remote mode already has lnd running separately, so the interceptor coupling doesn't apply.
Backward Compatibility
- External behavior change: litd will no longer exit when lnd stops in integrated mode
- Operators who rely on litd exiting when lnd stops may need to update their monitoring
- Consider a config flag to preserve the old behavior if needed
Acceptance Criteria
- litd continues running after lnd stops in integrated mode
- Status endpoint remains accessible and reports lnd's stopped state
- OS signals (SIGTERM/SIGINT) still shut down both lnd and litd
- litd's
StopDaemonRPC still shuts down both lnd and litd - No regression in remote mode behavior
- Integration tests cover the new behavior
- Documentation updated to reflect the change