Description
The keepAlive goroutine in pkg/driver/wsl2/vm_windows.go (and its sibling writer in provisionVM in the same file) writes to an unbuffered errCh without a context-aware select, so the writer blocks indefinitely once the hostagent stops draining errCh during shutdown — the same shape of bug fixed for the VZ driver in #4922 and for the WSL2 hot-loop in #4892, but on a different code path that those PRs did not touch.
// pkg/driver/wsl2/vm_windows.go
func keepAlive(ctx context.Context, distroName string, errCh chan<- error) {
keepAliveCmd := exec.CommandContext(ctx, "wsl.exe", "-d", distroName, "bash", "-c",
"nohup sleep 2147483647d >/dev/null 2>&1")
go func() {
if err := keepAliveCmd.Run(); err != nil {
errCh <- fmt.Errorf("error running wsl keepAlive command: %w", err)
}
}()
}
errCh is allocated unbuffered in (*LimaWslDriver).Start (wsl_driver_windows.go:245):
errCh := make(chan error)
It has exactly one consumer, in (*HostAgent).startRoutinesAndWait:
select {
case driverErr := <-errCh:
logrus.Infof("Driver stopped due to error: %q", driverErr)
case sig := <-a.signalCh:
logrus.Infof("Received %s, shutting down the host agent", osutil.SignalName(sig))
}
// after this point, no more reads from errCh
if closeErr := a.close(); closeErr != nil { ... }
cancelHA()
return a.driver.Stop(ctx)
The race on every limactl stop of a WSL2 instance:
- SIGTERM lands; the outer
select picks the signalCh arm. Nobody will read errCh again.
cancelHA() cancels the driver ctx → exec.CommandContext sends SIGKILL to the wsl.exe subprocess.
keepAliveCmd.Run() returns with signal: killed.
- The goroutine reaches
errCh <- fmt.Errorf(...).
- Send blocks forever — unbuffered channel, no consumer, no
case <-ctx.Done() fallback.
The goroutine remains parked on chan send for the rest of the hostagent process's lifetime, retaining the captured *exec.Cmd plus its stdio pipes.
The same hazard exists on the errCh <- fmt.Errorf(...) in provisionVM's goroutine (also vm_windows.go).
Reproduction (deterministic)
limactl start --vm-type=wsl2 --name=demo template://default-windows
# wait for ready
limactl stop demo
The bug is structural — unbuffered channel + no consumer + no ctx-aware send — so it fires on every stop. In an in-process test that cancels the driver ctx without exiting the process, a pprof.Lookup("goroutine") snapshot shows one goroutine parked at chan send in vm_windows.go:keepAlive.func1.
Fix
Mirror the VZ fix in #4922 inside the WSL2 driver:
- Add a
trySendErr(ctx, errCh, err) helper that selects on ctx.Done() so writers cannot block once the consumer is gone.
- Use it at both writers in
keepAlive and provisionVM.
- Buffer
errCh to size 2 in (*LimaWslDriver).Start so the first shutdown-time error is captured even before trySendErr is reached.
PR follows.
Environment
Affects every WSL2 instance (vmType: wsl2) on Windows. Found by code inspection against master.
Description
The
keepAlivegoroutine inpkg/driver/wsl2/vm_windows.go(and its sibling writer inprovisionVMin the same file) writes to an unbufferederrChwithout a context-aware select, so the writer blocks indefinitely once the hostagent stops drainingerrChduring shutdown — the same shape of bug fixed for the VZ driver in #4922 and for the WSL2 hot-loop in #4892, but on a different code path that those PRs did not touch.errChis allocated unbuffered in(*LimaWslDriver).Start(wsl_driver_windows.go:245):It has exactly one consumer, in
(*HostAgent).startRoutinesAndWait:The race on every
limactl stopof a WSL2 instance:selectpicks thesignalCharm. Nobody will readerrChagain.cancelHA()cancels the driver ctx →exec.CommandContextsendsSIGKILLto thewsl.exesubprocess.keepAliveCmd.Run()returns withsignal: killed.errCh <- fmt.Errorf(...).case <-ctx.Done()fallback.The goroutine remains parked on
chan sendfor the rest of the hostagent process's lifetime, retaining the captured*exec.Cmdplus its stdio pipes.The same hazard exists on the
errCh <- fmt.Errorf(...)inprovisionVM's goroutine (alsovm_windows.go).Reproduction (deterministic)
The bug is structural — unbuffered channel + no consumer + no ctx-aware send — so it fires on every stop. In an in-process test that cancels the driver ctx without exiting the process, a
pprof.Lookup("goroutine")snapshot shows one goroutine parked atchan sendinvm_windows.go:keepAlive.func1.Fix
Mirror the VZ fix in #4922 inside the WSL2 driver:
trySendErr(ctx, errCh, err)helper that selects onctx.Done()so writers cannot block once the consumer is gone.keepAliveandprovisionVM.errChto size 2 in(*LimaWslDriver).Startso the first shutdown-time error is captured even beforetrySendErris reached.PR follows.
Environment
Affects every WSL2 instance (
vmType: wsl2) on Windows. Found by code inspection againstmaster.