Windows: support native OpenSSH; no Git for Windows / MSYS2 required#4885
Windows: support native OpenSSH; no Git for Windows / MSYS2 required#4885jandubois wants to merge 21 commits into
Conversation
Lima calls cygpath to translate key/socket paths before invoking ssh-keygen and ssh on Windows. This assumes a Cygwin-based ssh (Git for Windows, MSYS2). When only native Windows OpenSSH is installed, cygpath is unavailable and limactl create fails immediately with: failed to convert path to mingw, maybe not using Git ssh? exec: "cygpath": executable file not found in %PATH% Detect the ssh flavor by checking whether cygpath.exe lives alongside ssh.exe (the layout used by Git for Windows and MSYS2). For native Windows OpenSSH, pass paths with forward slashes (C:/Users/...), which native ssh-keygen, ssh, and sshd accept. For Cygwin-based ssh the existing cygpath-based behavior is preserved, so users with Git for Windows see no change. This unblocks limactl create on plain Windows. Full end-to-end use of native Windows OpenSSH still requires a non-ControlMaster path for dynamic port forwarding (hostagent uses ssh -O forward/cancel), since Win32-OpenSSH does not implement SSH multiplexing (PowerShell/Win32-OpenSSH#1328, still open as of Feb 2026). That is a separate change. Related: lima-vm#4819 Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Two related changes used together by callers that talk to the ssh family of binaries on Windows: ParseOpenSSHVersion: also match "OpenSSH_for_Windows_X.YpZ" (the version banner emitted by native Windows OpenSSH). Previously the regex required a digit immediately after "OpenSSH_", so Win32-OpenSSH was misdetected as version 0.0.0 and Lima then treated it as pre-8.0 legacy ssh in code paths that branch on the version (e.g. scp URL form). PathForSSH: rename from the previously unexported pathForSSH and export it. copytool.parseCopyPaths needs the same path-translation logic, and duplicating the cygpath-vs-native decision in two packages would invite drift. Add a test for the Win32-OpenSSH version banner. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Three changes that together let limactl copy work on Windows when only native Windows OpenSSH is installed (no Git for Windows, no MSYS2). parseCopyPaths: route absolute Windows paths (e.g. C:\Users\jan\file) through sshutil.PathForSSH so that for native ssh the path becomes C:/Users/jan/file (forward slashes) instead of failing on a missing cygpath. Detect Windows drive-letter paths via filepath.VolumeName *before* splitting on ":" so the drive letter is not mistaken for an instance name in the "instance:path" form. scp.go, rsync.go: strip ControlMaster / ControlPath / ControlPersist from the ssh options on Windows. Native Windows OpenSSH does not implement SSH multiplexing, so leaving these options in caused scp to fail with "getsockname failed: Not a socket" before transferring any bytes. Cygwin-based ssh has known reliability issues with sftp over a mux socket, so stripping unconditionally on Windows is consistent with how hostagent and limactl shell already handle this. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Use compress/gzip directly instead of shelling out to a gzip binary. On Windows, gzip is not part of the base system, so the previous behaviour required Git for Windows or MSYS2 to be on PATH just to unpack a .tar.gz image during limactl create / start. Other formats (bzip2, xz, zstd) still go through the exec path, since they are less common in Lima image URLs and would need extra dependencies for in-process decompression. They can be migrated similarly in follow-ups if needed. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
ioutilx.WindowsSubsystemPath: keep cygpath as the preferred backend (it respects any custom fstab the user has configured for MSYS2 / Git for Windows), but add a native fallback for the common drive-letter case (C:\Users\jan -> /c/Users/jan). Without the fallback, plain Windows installs that have neither Git for Windows nor MSYS2 hit a fatal error during fillDefault when computing the default mountPoint for a host mount. After this change, the default mountPoint is computed correctly without external tooling. hostagent.setupMount: switch the host-path translation from ioutilx.WindowsSubsystemPath to sshutil.PathForSSH. The path is consumed by the sftp-server that sshocker spawns, and the format that binary expects depends on toolchain: Cygwin sftp-server (Git for Windows / MSYS2) wants Cygwin paths, native Windows sftp-server wants native forward-slash paths. PathForSSH already encodes that decision. Verified end-to-end on Windows 11 with QEMU 10.2.0 and only native Windows OpenSSH on PATH (no Git for Windows, no MSYS2): reverse-sshfs mounts a host directory into the guest, both sides see the same files, read and write both work, and the host-side sftp-server is the ssh-built-in C:\Windows\System32\OpenSSH\sftp-server.exe (auto-detected by sshocker via exec.LookPath). Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Three changes, intended as a research signal rather than a finished re-engineering of the Windows test matrix. windows-wsl2 and windows-qemu: drop the _LIMA_WINDOWS_EXTRA_PATH setting (and the corresponding entry in MSYS2_ENV_CONV_EXCL). After the prior commits in this branch, limactl no longer needs anything from C:\Program Files\Git\usr\bin for its core flow on Windows. The test scripts (hack/test-templates.sh) still run under MSYS2 bash and still rely on cygpath / awk / netcat from C:\msys64\usr\bin, so that PATH entry stays. Expected: existing tests pass unchanged, since the removed entry was strictly additive for limactl. windows-plain: new job that builds with `go build` and runs a minimal PowerShell smoke test (create / start / shell / copy / stop / delete) with PATH scrubbed of MSYS2 and Git for Windows. The PATH scrub is deliberately aggressive: if any new step starts requiring something from those toolchains, this job will fail and we will know about it. Verified locally on a Windows 11 host with only native OpenSSH; running on the GitHub Windows runner is the actual signal. Behaviour changes intended to break loudly, not silently. If the new job fails, the failure mode itself is the data we want. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
wsl2.md: drop the "Windows doesn't ship with ssh.exe, gzip.exe, etc." bullet, which has been incorrect since Windows 10 build 1803 (2018) for ssh and is now incorrect for gzip too (Lima decompresses gzip in pure Go since the prior commits in this branch). Replace with a section that describes the current behaviour: native OpenSSH is used when available; Git for Windows / MSYS2 is detected and used when present (so users with custom MSYS2 fstab entries see the existing cygpath behaviour); plain Windows works without either. environment-variables.md: add a historical-context note to _LIMA_WINDOWS_EXTRA_PATH explaining that it was originally needed to make Git-for-Windows binaries reachable to limactl, and is no longer required for the core flow. Keep the variable documented since it can still be useful for niche scenarios and the implementation in cmd/limactl/main.go remains. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Logging additions: surface the decisions Lima makes that previously went unmarked, so a `--debug` run on a failing host shows enough to diagnose without source-diving. sshutil.IsSSHCygwin: log the toolchain detection result once per process at Debug level, including the full ssh.exe path and whether cygpath.exe was found alongside. Caches the result in a sync.Once, so the log fires exactly once even when many call sites consult it. ioutilx.WindowsSubsystemPath: when the native fallback is taken (cygpath unavailable), log the input -> output mapping at Debug, so unexpected drive-letter conversions are visible in the trace. downloader.decompressLocal: change "decompressing X with gzip" to either "with in-process gzip" or "with external <cmd>" depending on which path was taken. The previous message was misleading after the in-process gzip change because it still said "with gzip" for both. copytool.scp / rsync: log Debug when ControlMaster options are stripped on Windows, so a copy-failure trace makes the mux decision visible without reading the source. hostagent.setupMount: log the resolved sftp-server LocalPath at Debug, so reverse-sshfs failures surface what was actually passed. CI: the windows-plain job now produces a tool inventory so failures have actionable context. Print PATH before and after the Cygwin/MSYS2/Git-for-Windows scrub. Enumerate every external binary Lima might shell out to on Windows, classified into required (must resolve and must come from C:\Windows\...), forbidden (must not resolve at all -- cygpath, pacman), and optional (logged for context, e.g. rsync, gzip, qemu-img). Fail with an actionable message if a required tool is missing or a forbidden one is found, so the CI failure points at the scrub regex rather than the smoke test that follows. Pass --debug to every limactl invocation in the smoke test, and add an "if: failure()" step that dumps ha.stderr.log, ha.stdout.log, serial.log, lima.yaml, and ssh.config from the instance directory. Persist the scrubbed PATH into $GITHUB_ENV so the smoke test and the failure-dump step run against the same environment as the verification step. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
revive's indent-error-flow check (in CI) flagged the if/else where the if branch returns. The else block only existed to give the cygpath err a name visible after the conditional; lifting the assignment out of the if-init achieves the same thing without the linter complaint and is also a touch easier to read. No behaviour change. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
The //nolint:staticcheck directive at client.go:35 has been intermittently flagged as unused by nolintlint when running on the windows-2025 lint job, even though the underlying code, the grpc dependency, and the directive itself are unchanged. The behaviour is sensitive to golangci-lint's analysis cache state: master has been passing this check, but small changes elsewhere in the module can shift the analyzer scheduling enough that staticcheck's SA1019 deprecation finding disappears from the input to nolintlint, which then reports the directive as unused. Add nolintlint to the directive so it self-suppresses, and document the reason inline so the next person who looks at this knows why. The grpc.Dial -> grpc.NewClient migration is a separate concern that would remove the need for the directive entirely; out of scope here. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
nolintlint flags //nolint directives whose target lint check did not report an issue at the same line, on the assumption that the directive is now stale. In practice the underlying linters' findings are sensitive to golangci-lint's analysis cache and analyzer scheduling, which can vary across platforms and across PRs even when the relevant source has not changed. The result is occasional spurious lint failures on CI for code that nobody touched. Disable the linter and revert the workaround that the previous commit applied to pkg/driver/external/client/client.go. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
The first run of windows-plain reported success in 2 minutes, which turned out to be entirely fictitious: the smoke test never actually exercised any Lima command end-to-end. Two compounding bugs: Build step only built limactl.exe, not the per-arch guest agent that limactl looks up at start time. Result: limactl create exits 1 with "guest agent binary could not be found for Linux-x86_64" before touching the VM. Add a second go build invocation that produces _output/share/lima/lima-guestagent.Linux-x86_64 with the same env (CGO_ENABLED=0 GOOS=linux GOARCH=amd64) the Makefile uses. Smoke step relied on $ErrorActionPreference = 'Stop' to abort on errors, but PowerShell's Stop preference does not catch non-zero exits from external commands, only PowerShell's own terminating errors. So when limactl create failed, the script kept going through start, shell, copy, stop, delete; each of those also failed but the job exited 0. Wrap every limactl invocation in an Invoke-Limactl helper that throws on $LASTEXITCODE != 0, so the run is honest about which step actually broke. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
This experimental Windows-only env var prepended a user-supplied directory to PATH inside limactl. It existed to inject Git for Windows or MSYS2 binaries without altering the user shell's PATH, back when Lima required a Cygwin-style toolchain for ssh, scp, ssh-keygen, and cygpath. After this branch's earlier commits, limactl works directly with native Windows OpenSSH and no longer needs anything from those toolchains. The variable served no purpose for the core flow and its leading underscore signalled that no compatibility was promised. Drop the implementation in cmd/limactl/main.go and the corresponding docs entry. The CI invocations were already removed earlier in this branch. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Three changes that together verify reverse-sshfs works on Windows in both supported toolchain configurations. windows-plain becomes windows-plain-wsl2 (rename only). Instance name and LIMA_HOME path follow. Add windows-plain-qemu mirroring the WSL2 sibling but with the QEMU driver. Same PATH-scrub-and-tool-inventory shape, same fail-loudly PowerShell wrapper around limactl. qemu-img and qemu-system-x86_64 move from optional to required in the inventory; the scrub keeps the \Program Files\QEMU directory after the Cygwin/MSYS2 entries are removed. Smoke test creates from templates/default.yaml (the same template windows-qemu uses), runs uname, and does a reverse-sshfs round-trip equivalent to hack/test-mount-home.sh: write a random string to a file in $USERPROFILE/lima-test-tmp on the host, read it via the mount in the guest, compare. hack/test-templates.sh: drop the line that disabled the mount-home check on Windows/Msys. The skip's comment cites a "failed to confirm whether /c/Users/runneradmin [remote] is successfully mounted" CI failure that pre-dates the path-translation and toolchain-detection work in this branch. Re-enabling exercises the Cygwin/MSYS2 sftp-server path of windows-qemu, the parallel of windows-plain-qemu's native sftp-server path, so we get coverage on both sides of the fork. If either side fails, that failure is the data we want. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
The first windows-plain-qemu run failed at create time with: fatal: template "_images/ubuntu.yaml" not found templates/default.yaml uses `base: template:_images/ubuntu` and `base: template:_default/mounts`, which Lima resolves via <prefix>/share/lima/templates/. Existing windows-qemu populates that directory by running `make`, whose TEMPLATES target does `cp -aL` of templates/ into _output/share/lima/templates/. windows-plain-qemu deliberately avoids `make` (so the build does not require MSYS2 make or bash), so the destination directory was empty and Lima could not find the base templates. Replicate the install in PowerShell. Recursive Copy-Item handles the plain files. The wrinkle is that templates/_images/<distro>.yaml and templates/<distro>.yaml are tracked as git symlinks (mode 120000), which Windows checks out as 17-byte plaintext stubs containing the target filename because core.symlinks defaults to false. We detect them via `git ls-tree`, follow the chain (opensuse.yaml -> opensuse-leap.yaml -> opensuse-leap-16.yaml is the deepest current chain), and overwrite the stub with the resolved file's contents. windows-plain-wsl2 does not need this; it uses templates/experimental/wsl2.yaml which has a self-contained images: section and no base: references. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
The windows-plain-qemu "Install templates" step assumed the working tree represented git symlinks as 17-byte plaintext stubs (the core.symlinks=false case). The GitHub Windows runner actually preserves them as real NTFS symlinks, so Get-Content on the stub path returned the target file's YAML content and the resolver threw: Stub templates\ubuntu-lts.yaml points at minimumLimaVersion: 2.0.0 base: - template:_images/ubuntu-24.04 - template:_default/mounts which does not exist at templates\minimumLimaVersion: 2.0.0 ... Read symlink targets directly from the git object store via `git cat-file blob <sha>` instead of probing the working tree. The resolution is now independent of whether core.symlinks is true (runner) or false (plain Windows checkout). Signed-off-by: Jan Dubois <jan.dubois@suse.com>
1218a5d removed the "mount-home skipped on Msys" line from hack/test-templates.sh and added an inline reverse-sshfs round-trip to windows-plain-qemu, on the hypothesis that the path-translation work in this branch would address the pre-existing failure the skip was there for. It does not: both jobs fail with [hostagent] fusermount3: mount failed: Permission denied inside the Ubuntu 25.10 guest, with /etc/fuse.conf already containing user_allow_other (the hostagent's pre-mount requirement check passes before the mount is attempted). Same symptom on MSYS2 sftp-server and on native Windows sftp-server, so the failure is below the toolchain boundary — something in ubuntu-25.10 + fuse3 + AppArmor's unprivileged-userns restriction. Tracked for a separate follow-up. Restore the Msys skip and drop the inline round-trip block. The remaining windows-plain-qemu smoke (create / start / shell / copy / stop / delete via native OpenSSH with sftp-server autodetection on PATH) still covers the native-Windows-OpenSSH work that is the subject of this branch. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: "limactl copy C:foo ." on an instance named "C" was silently mis-routed through PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so a working rsync install on native Windows OpenSSH was rejected before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map keyed by the resolved absolute path, and filepath.EvalSymlinks the path before the directory check so a chocolatey/scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug so the cause is traceable. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Review S6. The path-space chain walker in the "Install templates" step silently swallows '..' underflow: a symlink chain with enough '..' segments would resolve to a path outside templates/ and Copy-Item would cheerfully copy whichever file is there. Lima's current templates don't have any such symlinks, so this is defensive only — but the loop is easy to trip over on a future template edit. Add a post-resolution check that the resolved path still starts with the source root, and throw otherwise. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Review S9: the previous wording said native Windows OpenSSH includes ssh, scp, ssh-keygen, and sftp-server and ships on Windows 10 build 1803+. Correct only for the client components. sftp-server is part of OpenSSH Server, which is a separate Feature on Demand and not installed by default. Lima's QEMU + reverse-sshfs path needs it; WSL2 does not. Spell the distinction out and link the MS install doc. Review S7: hostagent/mount.go recomputes the reverse-sshfs LocalPath via PathForSSH on every start, while defaults.go resolves the default MountPoint once at create via WindowsSubsystemPath. A user who creates with Git for Windows on PATH and then starts without it (or vice versa) would see LocalPath change shape between restarts without warning. Document the constraint: pick one ssh toolchain and stick with it for an instance's lifetime. Persisting LocalPath at create time would fix this at the code layer but is out of scope here. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
revive's redefines-builtin-id flagged 'real' as shadowing the Go built-in (the complex-number function). Caught by the Lint Go job on all three platforms after 7f760ab added the EvalSymlinks call. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
|
For the record, the round 1 review mentioned in the summary is https://jandubois.github.io/lima/20260424-190657-pr-4885.html |
|
Thanks for the solid foundation here, @jandubois! I’ll be testing and reviewing this locally on Windows to help provide a status signal for the research. I’m also volunteering to pick up the 'pre-existing limitations' for #4819 as follow-up PRs. Specifically, I'm exploring a pure-Go Excited to help move this toward the finish line! |
|
@jandubois - I just tested it locally and here is the report . Verification Report: Windows Native Path Translation Test Environment: Host OS: Windows (Native) Isolation: Isolated Dependency Check: Methodology: Key Findings: Captured Log Evidence:
|
|
Thanks for the feedback @liketosweep! I would like to hear from other maintainers, and from @arixmkii if this PR looks sensible in principle, and if I should spend the effort on cleaning it up and creating one or more actual mergeable PRs from it. Probably won't have time for it for a little bit, but once I know we want to do this, I can work on it when time opens up. |
|
Thank you, @jandubois. I completely agree with holding off until there is consensus from the other maintainers and @arixmkii on the architectural direction. I will monitor the discussion here and remain available if further local testing is needed once a decision is reached. |
|
I tested the PR on my Windows Machine(Windows 11 Home 25H2). No MSYS2, no Cygwin, no Git for Windows: Here are some key observations(We'll have to see if they are universal):
The same binary works perfectly when placed into
Once these are out of the way, I can finally run a QEMU instance: QEMU log
For wsl2, using the |
| $env:GOARCH = 'amd64' | ||
| go build -o _output\share\lima\lima-guestagent.Linux-x86_64 .\cmd\lima-guestagent | ||
| if ($LASTEXITCODE -ne 0) { throw "lima-guestagent build failed: $LASTEXITCODE" } | ||
| - name: Scrub PATH and verify only native Windows toolchain is reachable |
There was a problem hiding this comment.
Can't we just uninstall Cygwin/MSYS2/Git-for-Windows from the CI env?
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: "limactl copy C:foo ." on an instance named "C" was silently mis-routed through PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so a working rsync install on native Windows OpenSSH was rejected before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map keyed by the resolved absolute path, and filepath.EvalSymlinks the path before the directory check so a chocolatey/scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug so the cause is traceable. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: on an instance named "C", "limactl copy C:foo ." silently went to PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so checkRsyncOnGuest rejected a working rsync install on native Windows OpenSSH before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map whose key is the resolved absolute path; EvalSymlinks the path before the directory check so a chocolatey or scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug to expose the cause. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
|
Superseded by #4998 |
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: on an instance named "C", "limactl copy C:foo ." silently went to PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so checkRsyncOnGuest rejected a working rsync install on native Windows OpenSSH before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map whose key is the resolved absolute path; EvalSymlinks the path before the directory check so a chocolatey or scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug to expose the cause. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: on an instance named "C", "limactl copy C:foo ." silently went to PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so checkRsyncOnGuest rejected a working rsync install on native Windows OpenSSH before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map whose key is the resolved absolute path; EvalSymlinks the path before the directory check so a chocolatey or scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug to expose the cause. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Grouped fixes from the review of PR lima-vm#4885 that all live in the Windows-SSH code paths. copytool.parseCopyPaths (review I2): the Windows local-abs detection used filepath.VolumeName, which returns "C:" for the drive-relative path "C:foo" as well. That shadowed single-letter instance names: on an instance named "C", "limactl copy C:foo ." silently went to PathForSSH instead of the colon-split. Switch to filepath.IsAbs, which correctly returns false for "C:foo" and keeps absolute forms (C:\foo, C:/foo, UNC) classified as local. Add a table test pinning the three cases plus an explicit instance:path form. copytool.checkRsyncOnGuest (review I1): the probe built sshOpts and ran ssh without stripping ControlMaster/ControlPath/ControlPersist on Windows, which both rsyncTool.Command and scpTool.Command already do. The probe therefore hit the same mux-socket failure the Command path works around, so checkRsyncOnGuest rejected a working rsync install on native Windows OpenSSH before "command -v rsync" ran on the guest. Mirror the mux-strip on Windows. sshutil.IsSSHCygwin (review S1, S3): the sync.Once cache keyed the answer to the first caller's sshExe, so a future caller that wants to re-detect after an SSH swap (e.g. a test exercising both branches) cannot. Replace with a map whose key is the resolved absolute path; EvalSymlinks the path before the directory check so a chocolatey or scoop shim does not throw off the cygpath.exe sibling probe. sshutil.ParseOpenSSHVersion (review S4): the 0.0.0 fallback silently downgrades version-gated behaviour (cipher selection, scp URL form). Log the unparsed banner at Debug to expose the cause. ioutilx.WindowsSubsystemPath (review S8): the native cygpath fallback assumed orig[2:] begins with a separator, which is true for today's callers (all absolute). Harden it so a future caller passing the drive-relative "C:foo" does not get "/cfoo" back. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
DO NOT MERGE
This PR has been created with Claude Code using Opus 4.7.
It is for discussion and testing purposes only and has not been reviewed yet!
Summary
This branch makes
limactlwork on Windows hosts that have only the toolchain shipped in a default Windows 10/11 install (native OpenSSH,wsl.exe,tar). Lima previously required Git for Windows or MSYS2 onPATHforcygpath,ssh,ssh-keygen,scp, andgzip. After this branch, none of those external tools are required for the core flow on plain Windows.The PR is offered as research / discussion. The goal is to surface what does and does not work end-to-end, get CI signal across the existing Windows runners plus two new "plain Windows" runners (one per driver), and decide together how much of this we want to land.
Related: #4819. The originating thread there narrowed to "drop the
cygpath.exedependency"; investigating that turned up the broader story below.Background — what was actually required, and why
Two coupled root causes drove the historical Git-for-Windows / MSYS2 requirement:
ControlMasterfor the legacy ssh-based dynamic port forwarder, native ssh would not work, so Cygwin-built ssh was required./c/Users/jan/...). Hencecygpathto translate. Hence the cascade of "why are we calling cygpath everywhere?" sites in the codebase.Three things make the dependency easier to drop than I expected when starting:
pkg/portfwd/forward.go, gRPC-tunnelled via vsock). The legacy ssh-based forwarder is opt-in viaLIMA_SSH_PORT_FORWARDER=true. SoControlMasteris not actually needed by the default flow — verified empirically: startingpython3 -m http.serverin the guest produces a TCP listener owned bylimactl.exe, not ssh.sftp-server.exe(when the OpenSSH Server optional feature is installed) and is auto-detected bysshockerviaLookPath("sftp-server"). So reverse-sshfs works natively too once the host-path translation is corrected.-F /dev/nullas an empty config, the same way Cygwin ssh does. Lima's hardcoded-F /dev/nullargument did not need to change.Detection mechanism
A new
sshutil.IsSSHCygwin(sshExe)checks whethercygpath.exelives in the same directory asssh.exe(the layout used by Git for Windows and MSYS2), after resolving symlinks so chocolatey/scoop shims do not throw the directory check off. Results are cached per-resolved-absolute-path and logged once per path at Debug level.sshutil.PathForSSH(ctx, sshExe, path)then dispatches:cygpathfor path translation (preserves any custom MSYS2 fstab the user has)filepath.ToSlash(e.g.C:/Users/jan/...), which nativessh,ssh-keygen,scp, andsftp-serveracceptioutilx.WindowsSubsystemPath(used in the few sites that produce a guest mount-point path rather than a host argument) keepscygpathas the preferred backend but falls back to a native drive-letter conversion (C:\Users\jan→/c/Users/jan) whencygpathis unavailable, so the on-disk Lima config remains identical regardless of toolchain.Per-commit breakdown
f21ba237is reverted by3e09daeb;f98222d5partially reverts1218a5d1; in a non-research PR these would be squashed.Review feedback addressed (round 1)
Commit
7f760ab6folds in fixes for the two Important issues and several Suggestions from the first review pass:copytool.checkRsyncOnGuestnow strips the mux options on Windows. The probe previously hit the same ControlMaster failure thersync.Command/scp.Commandpaths work around, so a working rsync install on native Windows OpenSSH was rejected before the guest-sidecommand -v rsyncran.copytool.parseCopyPathsnow usesfilepath.IsAbsinstead offilepath.VolumeName != ""to decide local vs remote. The previous check classified the drive-relative formC:fooas local, silently shadowing single-letter instance names. A newTestParseCopyPathsWindowsDriveLettertable test locks inC:\foo/C:/foo/C:foo/instance:path.sshutil.IsSSHCygwindrops itssync.Oncein favour of a map keyed by the resolved absolute path, withfilepath.EvalSymlinksapplied before thecygpath.exesibling check so chocolatey/scoop shims do not mis-detect.ParseOpenSSHVersionlogs the unparsed banner at Debug when it falls back to0.0.0, so cipher-selection and scp-URL downgrades are traceable.windows-plain-qemutemplate-install step now asserts the resolved symlink target stays insidetemplates/after the path-space chain walk. Defensive only; Lima's current templates don't use...LocalPathon every start; users should not swap between native Windows OpenSSH and Cygwin ssh betweenlimactl createandlimactl starton a QEMU instance with reverse-sshfs mounts. The follow-up would persistLocalPathat create time.ioutilx.WindowsSubsystemPath's native cygpath fallback now inserts the separator explicitly, so a hypothetical drive-relative inputC:foodoes not collapse to/cfoo.sftp-server.exe; needed for QEMU + reverse-sshfs).parseCopyPathsnow documents UNC-path behaviour inline.Skipped by design:
_LIMA_WINDOWS_EXTRA_PATHstill being set. Judged noise over value.Deferred to follow-ups:
LocalPathat create time (design change).f21ba237+3e09daeband the similar pairs (only if/when this branch moves toward merge).Touched call sites
pkg/sshutil/sshutil.go—DefaultPubKeys(ssh-keygen),identityFileEntry(IdentityFile),SSHOpts(ControlPath);ParseOpenSSHVersionregex extended to matchOpenSSH_for_Windows_X.YpZ;IsSSHCygwincache reshaped per reviewpkg/copytool/copytool.go—parseCopyPathshost-path translation withfilepath.IsAbsgate (corrected per review I2)pkg/copytool/scp.go,pkg/copytool/rsync.go— stripControlMaster/ControlPath/ControlPersiston Windows so the underlying ssh does not try to use a mux socket that is unavailable (native) or unreliable (Cygwin). IncludescheckRsyncOnGuestper review I1.pkg/hostagent/mount.go— reverse-sshfsLocalPathtranslation now matches the toolchainpkg/ioutilx/ioutilx.go— native cygpath fallback, plus debug logging of the conversion, hardened per review S8pkg/downloader/downloader.go— gzip decompression via in-processcompress/gzip. Other formats still shell outcmd/limactl/main.go— drop the_LIMA_WINDOWS_EXTRA_PATHPATH-injection hook (no longer required for the core flow).golangci.yml— disablenolintlint, which was producing flaky cross-platform "directive is unused" failures because golangci-lint's analysis cache invalidation can cause SA1019 to drop in and out of the post-processing pipelineTested locally on Windows 11 with QEMU 10.2.0
Hardware: Windows 11 Pro 26100, native OpenSSH
OpenSSH_for_Windows_9.5p2, withPATHcleaned of Git for Windows and MSYS2 entries.WSL2 driver (
templates/experimental/wsl2.yaml, finch rootfs)limactl create— generates ed25519 keypair via native ssh-keygen with native pathslimactl start— boots, runs requirements ("ssh", "user session is ready for ssh", "Explicitly start ssh ControlMaster" all pass withControlMaster=noexplicitly set on every ssh invocation)limactl shell uname -a→Linux 6.6.87.2-microsoft-standard-WSL2 x86_64Invoke-WebRequest http://127.0.0.1:8888reaches apython3 -m http.serverrunning in the guest. The TCP listener on the host islimactl.exeitself (PID owns the socket), confirming the gRPC/vsock forwarder is in use, notssh -O forwardlimactl copyhost → guest and guest → hostlimactl shell --workdir=/tmplimactl stop,limactl delete --forceQEMU driver with reverse-sshfs (
vmType: qemu,mountType: reverse-sshfs)limactl create— defaultmountPointcomputed correctly via the new native cygpath fallback (C:\Users\jan\qemu-share→/c/Users/jan/qemu-share)limactl start— sshocker auto-detectsC:\Windows\System32\OpenSSH\sftp-server.exeviaLookPath("sftp-server"), mount succeedsGet-Contentand bashcatNote: this local verification used a custom
qemu-sharemount, not the default template's~mount. CI exercises the default template and hits a pre-existing, non-PR-related guest-sidefusermount3: mount failed: Permission deniedon Ubuntu 25.10. See "Intentionally not addressed" below.Existing Cygwin-ssh path (Git for Windows on
PATH)limactl create+start+shell+copyall still work unchanged when Git for Windows is on PATH.IsSSHCygwincorrectly identifies the toolchain,cygpathis invoked, paths come out as/c/Users/.... No regression for users who have Git for Windows installed.Unit tests
go test ./pkg/sshutil/... ./pkg/copytool/... ./pkg/downloader/... ./pkg/ioutilx/... ./pkg/limayaml/...passes on Windows. Cross-compile ofpkg/sshutilfor darwin and a full Linux build of./cmd/limactlboth succeed.Intentionally not addressed in this PR
Reverse-sshfs mount into the default template's
~fails on GH Windows runners (guest-side, Ubuntu 25.10)hack/test-templates.shkeeps the pre-existing skipfor the
defaulttemplate case. An earlier commit on this branch (1218a5d1) removed that skip on the hypothesis that the new path-translation work covered the original failure. It did not: bothwindows-qemu(MSYS2 sftp-server) andwindows-plain-qemu(native sftp-server) fail withinside the Ubuntu 25.10 guest, after
/etc/fuse.confhas been populated withuser_allow_otherby Lima's cloud-init and the hostagent's pre-mount requirement check has confirmed it. Same symptom on both toolchains, so the failure is below the sftp-server / ssh boundary.The guest runs with
kernel.apparmor_restrict_unprivileged_userns=1andkernel.apparmor_restrict_unprivileged_unconfined=1(Ubuntu 23.10+ default, applied by/usr/lib/sysctl.d/10-apparmor.conf). Lima has an AppArmor profile for/usr/local/bin/rootlesskitto work around this restriction for containerd, but no equivalent forfusermount3. The leading (unconfirmed) hypothesis is that libfuse3 uses user namespaces for the mount when invoked by a non-root user with-o allow_other, and AppArmor denies that for the unconfinedlimauser. Audit is disabled in the kernel (audit: initializing netlink subsys (disabled)) so DENIED lines don't show in dmesg without re-enabling it.This reproduces on the CI runners on both
masterand this branch, and is outside the scope of this PR (native Windows OpenSSH support). Tracking for a separate follow-up; the commit message onf98222d5has the full trail.limactl shell --syncis left unchanged — postponed to a follow-uplimactl shell --sync DIRcurrently does an earlyexec.LookPath("rsync")and exits withrsync is required for --sync but not foundwhen rsync is unavailable. After this branch:--syncstill fails immediately. Was already failing before; nothing about this PR changes that.--syncalso fails. Despite my (incorrect) earlier claim, Git for Windows does not include rsync in its bundle — onlyssh,scp,ssh-keygen, etc. Users have to manually drop rsync intoGit\usr\bin\(there is a well-known how-to for it). I learned this while writing this PR.--syncworks afterpacman -S rsync.A natural fix is to fall back to
scpwhen rsync is missing —pkg/copytoolalready has the auto-fallback machinery, butcmd/limactl/shell.go:223opts out of it by hardcodingBackendRsync. Switching toBackendAutois small. The reason it is not in this PR is that the scp fallback is a meaningful behaviour change for--sync:--delete(propagate file removals)--itemize-changes)The most consequential gap is
--delete. If the user adds a file on the host, syncs, deletes it on the host, and re-syncs, rsync would propagate the deletion to the guest; scp would leave a phantom file. The diff-view ("View the changed contents") path also degrades: without--itemize-changesthe prompt collapses to a generic "Accept all changes? (y/n)" without specifics, unless we re-implement file-tree comparison in Go.A workable fallback design would be: when rsync is unavailable, switch to
BackendAuto, log a one-time warning that--deleteis disabled and stats will be coarse, and either skip the itemized-stats prompt or replace it with a native tree walk. None of that is hard, but the UX policy decision (warn-and-degrade vs flag-gated--sync-tool=scpopt-in vs reject) deserves its own discussion. Tracking as a follow-up.Other deferred items
nolintlintlinter is disabled in.golangci.yml. This silences a real but flaky failure mode (analysis-cache nondeterminism causes the SA1019 deprecation to drop in and out of the input to nolintlint, which then flags the//nolint:staticcheckdirective atpkg/driver/external/client/client.go:35as unused). The proper fix is to migrategrpc.Dialtogrpc.NewClientand remove the directive entirely; out of scope here.hack/test-templates.shis unchanged in this PR. It still uses cygpath, bash arrays, netcat, etc., and still runs under MSYS2 bash in the existingwindows-wsl2andwindows-qemujobs. Rewriting it for plain Windows is a separate, larger project..tar.gz, so the gzip path is the one that mattered. The others can be migrated similarly when needed.pkg/hostagent/mount.gousesPathForSSH(toolchain-aware) rather thanWindowsSubsystemPath(cygwin-style always). PersistingLocalPathat create time would remove the toolchain-swap hazard described in the new WSL2 docs note (review S7); kept out of scope here.CI changes
Existing jobs (
windows-wsl2,windows-qemu)_LIMA_WINDOWS_EXTRA_PATH = 'C:\Program Files\Git\usr\bin'. After this branch, limactl does not need anything from there. Strict subset of the old environment._LIMA_WINDOWS_EXTRA_PATHentry fromMSYS2_ENV_CONV_EXCL(it was only there because we were setting the var).hack/test-templates.shstill skips themount-homecheck on Msys (see the section above for the reason).New jobs
Two new jobs, mirroring the two existing Windows drivers but on a plain-Windows host:
windows-plain-wsl2— builds withgo build(nomake, no MSYS2 bash needed for the build step), scrubsPATHof MSYS2 and Git for Windows, then runs a PowerShell smoke test (create→start→shell→copy→stop→delete) against the same WSL2 rootfs template the existingwindows-wsl2job uses.windows-plain-qemu— same shape aswindows-plain-wsl2, but installs QEMU 10.2.0 and usestemplates/default.yaml(matchingwindows-qemu). Smoke test covers the same create/start/shell/copy/stop/delete sequence. The reverse-sshfs mount itself is not exercised in this job because it hits the pre-existing Ubuntu-25.10 fusermount3 issue described above.Both new jobs:
PATHbefore and after the scrub.sshresolves toC:\Windows\System32\OpenSSH\ssh.exespecifically.cygpath,pacman— must not resolve at all), and optional (logged for context only). Fails with an actionable message if a required tool is missing or a forbidden one is found.--debugto everylimactlinvocation so the workflow log captures the toolchain-detection result, ssh args, path-translation decisions, etc.if: failure()step dumpsha.stderr.log,ha.stdout.log,serial.log,lima.yaml,ssh.configfrom the instance directory.What this PR is hoping CI will confirm
windows-plain-wsl2passes end-to-end. Already confirmed in a previous run on this branch.windows-plain-qemupasses end-to-end for the non-mount path (create / start / shell / copy / stop / delete via native OpenSSH + native sftp-server autodetection).windows-wsl2still passes with_LIMA_WINDOWS_EXTRA_PATHremoved. Strict subset of the old environment, but the only way to confirm there is no hidden dependency is to run it.--debugproduces enough trail to diagnose any failure without local repro. The new debug logs (toolchain detection, in-process gzip, mux-stripping, reverse-sshfs LocalPath, native cygpath fallback,ParseOpenSSHVersionfallback) plus the failure-mode hostagent log dump are designed to make any CI failure self-diagnosable.If items 1–3 fail in interesting ways, those failures themselves are the data we want from this PR. The branch is structured as small focused commits to make per-commit revert easy.