Skip to content

external/server: close driver.stderr.log on Stop so Windows can delete it#4959

Open
mn-ram wants to merge 1 commit into
lima-vm:masterfrom
mn-ram:fix/external-driver-logfile-leak
Open

external/server: close driver.stderr.log on Stop so Windows can delete it#4959
mn-ram wants to merge 1 commit into
lima-vm:masterfrom
mn-ram:fix/external-driver-logfile-leak

Conversation

@mn-ram
Copy link
Copy Markdown
Contributor

@mn-ram mn-ram commented May 11, 2026

Closes #3736.

Symptom (from the issue)

On Windows, after stopping a WSL2 instance, limactl rm --force <inst> fails with:

failed to remove "C:\Users\…\.lima\<inst>": remove
C:\Users\…\.lima\<inst>\driver.stderr.log:
The process cannot access the file because it is being used by another process.

Root cause

server.Start in pkg/driver/external/server/server.go opens driver.stderr.log via os.OpenFile and hands the *os.File to the external-driver subprocess as cmd.Stderr:

logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0o644)
...
cmd.Stderr = logFile

The handle is never closed in the parent (limactl). On POSIX, unlink is fine while an FD is held open; on Windows, any open HANDLE blocks DeleteFile. server.Stop sent SIGTERM but never Wait-ed the subprocess and never closed the inherited log file, so by the time the cleanup code in limactl rm --force tried to delete the instance directory, the parent process still held the file open and Windows refused to delete it.

Affects every external driver on Windows (wsl2, qemu, vz, krunkit), but the user-visible damage is worst on wsl2 because that's the Windows default.

Fix

  1. Add LogFile *os.File to registry.ExternalDriver so Stop can find the handle.
  2. In Stop, after sending SIGTERM, call cmd.Wait() (so the kernel reaps the child and detaches its end of the inherited handle), then explicitly Close() the parent's handle.
  3. In Start, add a defer that closes logFile on every failure path between OpenFile and the success-path assignment to extDriver.LogFile; the defer is disarmed (logFile = nil) once ownership is transferred. Without this, every failed Start was leaking the FD.

+32 / 0, two files. No API change, no architecture change.

Test plan

  • gofmt -l pkg/driver/external/server/server.go pkg/registry/registry.go — clean
  • go vet ./pkg/driver/external/... ./pkg/registry/... — clean
  • go build ./pkg/driver/external/... ./pkg/registry/... ./cmd/limactl/... — clean
  • GOOS=windows GOARCH=amd64 go build ./pkg/driver/external/... ./pkg/registry/... — clean
  • go test ./pkg/registry/... — pass
  • CI: Windows tests (WSL2, QEMU)
  • Manual on a Windows host:
    limactl start --vm-type=wsl2 --name=demo template://default-windows
    limactl stop demo
    limactl rm --force demo
    # Before: fails with "The process cannot access the file because it is being used by another process"
    # After: succeeds and the instance directory is fully removed.

…e it

On Windows, `limactl rm --force` failed for instances using an external
driver (wsl2, qemu, vz, krunkit) with:

    failed to remove "C:\\Users\\…\\.lima\\<inst>": remove
    C:\\Users\\…\\.lima\\<inst>\\driver.stderr.log:
    The process cannot access the file because it is being used by another process.

Root cause: server.Start opens driver.stderr.log via os.OpenFile and hands
it to the external-driver subprocess as cmd.Stderr, but the *os.File
handle in the parent (limactl) was never closed. POSIX is happy to delete
files that are still open in some process; Windows is not — any open
HANDLE blocks DeleteFile.

server.Stop sent SIGTERM but did not Wait() the subprocess or close the
inherited log file, so by the time limactl proceeded to remove the
instance directory the parent still held the handle open.

Fix:

  * Store the *os.File on ExternalDriver as LogFile so Stop can find it.
  * In Stop, after SIGTERM, Wait() the subprocess (reaping the child also
    detaches its inherited handle on Windows) and then explicitly Close
    the LogFile in the parent.
  * In Start, register a defer that closes LogFile on any failure path
    between OpenFile and the moment ownership is transferred to
    extDriver; disarm it once extDriver.LogFile is set. Without this,
    every failed start leaked the FD.

Closes: lima-vm#3736
Signed-off-by: mn-ram <235066282+mn-ram@users.noreply.github.com>
@unsuman unsuman added the area/vmdrivers VM driver infrastructure label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/vmdrivers VM driver infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to stop instance on Windows

2 participants