Skip to content

Commit 66a9a11

Browse files
fix(ado): resolve az binary via shutil.which for Windows az.cmd (closes #1430) (#1432)
* fix(ado): resolve az binary via shutil.which for Windows az.cmd (closes #1430) On Windows the Azure CLI ships as az.cmd (a batch wrapper). Python's subprocess.run -> CreateProcessW does NOT honor PATHEXT for non-.exe executables, so a bare "az" token cannot be resolved and raises FileNotFoundError. shutil.which("az") DOES honor PATHEXT and finds az.cmd, so AzureCliBearerProvider.is_available() returned True while every actual subprocess call failed. The error then cascaded through the ADO --update preflight, killing Git Credential Manager along the way, and surfaced as the misleading 'az present but not logged in' diagnostic -- even when the user WAS logged in via az login. Layer 1 (root cause): resolve the az executable once in __init__ via shutil.which and store the absolute path. Use it in every subprocess call (_run_get_access_token, get_current_tenant_id). is_available() now mirrors the same construction-time resolution so it can no longer disagree with get_bearer_token()'s pre-check. Explicit absolute / separator-bearing paths from callers (tests) are trusted verbatim. Layer 2 (defense in depth): pipeline.py _preflight_auth_check now strips GIT_CONFIG_GLOBAL / GIT_CONFIG_NOSYSTEM / GIT_ASKPASS from the probe env for ADO hosts too (previously only generic). This lets Git Credential Manager answer for Entra-cached ADO credentials when az bearer acquisition is unavailable for any reason -- the #1430 PATH quirk, sandbox, proxy, etc. The actual clone path remains isolated (it builds its own clean env via GitAuthEnvBuilder.setup_environment), so the security rationale for empty global gitconfig during clone is preserved. Only the single ls-remote probe leg gains GCM access. Tests: - New TestWindowsAzCmdResolution class with the exact regression shape: shutil.which returns a Windows az.cmd path; verify subprocess receives it as argv[0] for both get_bearer_token and tenant probe. - New is_available stability test (resolution is one-shot). - Init absolute-path bypass test (preserves test injection contract). - Existing test_ado_host_retains_credential_blocking_env renamed and inverted to assert the new contract: ADO preflight no longer kills GCM. Existing ADO bearer-fallback E2E unchanged (the fake-git path has no GCM installed so behavior is unaffected). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(ado): address panel review (#1430 round 2) Three nits surfaced by auth-expert + supply-chain-security panel: 1. Tighten the explicit-path bypass in AzureCliBearerProvider.__init__ from 'isabs OR contains os.sep' to 'isabs only'. The os.sep heuristic accepted relative-with-separator tokens like 'subdir/az' verbatim, handing a CWD-relative path to subprocess.run without going through shutil.which. Only absolute paths are now trusted verbatim; everything else flows through resolution. New test test_init_relative_with_separator_still_resolves locks the contract. 2. Add an explicit early-return in get_current_tenant_id when _az_command is None. Previously this case relied on the broad 'except Exception' to swallow a TypeError from passing None as argv[0]. The new guard mirrors get_bearer_token's is_available() pre-check. New test test_get_current_tenant_id_returns_none_without_subprocess_when_az_missing asserts subprocess.run is never invoked in that path. 3. Add an explicit FileNotFoundError regression-trap test (test_bare_az_would_raise_filenotfound_but_resolved_path_succeeds) that simulates the exact Windows CreateProcessW behaviour: argv[0] == 'az' raises FileNotFoundError, argv[0] == resolved .cmd path succeeds. Pins both halves of the #1430 contract in a single test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(changelog): add #1430 entry under Unreleased > Fixed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent a245f94 commit 66a9a11

5 files changed

Lines changed: 210 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313

1414
### Fixed
1515

16+
- `apm update` against private Azure DevOps deps no longer fails on Windows with a misleading "az present but not logged in" diagnostic when the user IS signed in via `az login`. Root cause: Python's `subprocess.run(["az", ...])` -> `CreateProcessW` does not honor `PATHEXT` for non-`.exe` executables, so the Windows `az.cmd` wrapper could not be invoked even though `shutil.which("az")` resolved it. `AzureCliBearerProvider` now resolves the `az` binary via `shutil.which` once at construction and passes the absolute path to every subprocess call. As a defense-in-depth measure, the ADO `--update` preflight probe no longer strips `GIT_CONFIG_GLOBAL` / `GIT_CONFIG_NOSYSTEM` / `GIT_ASKPASS`, so Git Credential Manager can answer for Entra-cached ADO credentials whenever bearer acquisition is unavailable for any reason (sandbox, proxy, future PATH quirks). The actual clone path keeps its full gitconfig isolation. (#1430)
17+
1618
- Root `.apm` hooks no longer duplicate after renaming the project directory or using git worktrees; Claude, Codex, Cursor, Gemini, and Windsurf hook configs stay idempotent across checkouts. The hook source-id is now derived from `apm.yml`'s `name` field instead of `install_path.name`, and `apm install` silently heals stale same-content entries from prior checkout basenames. Copilot is unaffected (its hooks live in per-file namespaces under `.github/hooks/`, not a shared merged config). (#1392, closes #1329)
1719

1820
## [0.14.1] - 2026-05-20

src/apm_cli/core/azure_cli.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from __future__ import annotations
2121

2222
import json
23+
import os
2324
import shutil
2425
import subprocess
2526
import threading
@@ -82,7 +83,25 @@ class AzureCliBearerProvider:
8283
_EXPIRY_SLACK_SECONDS: int = 60
8384

8485
def __init__(self, az_command: str = "az") -> None:
85-
self._az_command = az_command
86+
# Resolve the az executable once at construction. On Windows the
87+
# Azure CLI ships as az.cmd (a batch wrapper), and Python's
88+
# subprocess.run -> CreateProcessW does NOT honor PATHEXT for
89+
# non-.exe executables, so a bare "az" token fails with
90+
# FileNotFoundError even though `shutil.which("az")` resolves it.
91+
# See microsoft/apm#1430. Resolving via shutil.which here (which
92+
# DOES honor PATHEXT) and storing the absolute path keeps every
93+
# downstream subprocess.run call portable across platforms.
94+
#
95+
# Edge case: callers (notably tests) may pass an explicit absolute
96+
# path. We trust absolute paths verbatim and skip re-resolution.
97+
# Relative tokens always flow through shutil.which so a
98+
# CWD-relative form like "subdir/az" cannot bypass resolution.
99+
#
100+
# PATH changes mid-process won't be detected; restart the CLI.
101+
if os.path.isabs(az_command):
102+
self._az_command: str | None = az_command
103+
else:
104+
self._az_command = shutil.which(az_command)
86105
# Cache stores (token, expires_at_epoch_seconds). expires_at is None
87106
# if the response did not include an expiresOn field (very old az
88107
# versions); in that case the token is treated as never-expiring
@@ -93,12 +112,16 @@ def __init__(self, az_command: str = "az") -> None:
93112
# -- public API ---------------------------------------------------------
94113

95114
def is_available(self) -> bool:
96-
"""Return True iff the ``az`` binary is on PATH.
115+
"""Return True iff the ``az`` binary was found on PATH at construction.
116+
117+
Resolution is one-shot (performed in :meth:`__init__`). PATH changes
118+
made after the provider was constructed will NOT be observed; restart
119+
the CLI if you have installed ``az`` mid-session.
97120
98121
Does NOT check whether the user is logged in -- that requires a
99122
subprocess call and is deferred to :meth:`get_bearer_token`.
100123
"""
101-
return shutil.which(self._az_command) is not None
124+
return self._az_command is not None
102125

103126
def get_bearer_token(self) -> str:
104127
"""Acquire (or return cached) bearer token for Azure DevOps.
@@ -146,6 +169,12 @@ def get_current_tenant_id(self) -> str | None:
146169
Uses ``az account show --query tenantId -o tsv``. Returns ``None``
147170
on any failure -- this method never raises.
148171
"""
172+
# Explicit early-return when az was not resolved at __init__: avoids
173+
# passing None as argv[0] to subprocess.run (which would TypeError
174+
# and only be swallowed by the broad except below). Mirrors the
175+
# is_available() guard that get_bearer_token() does.
176+
if not self._az_command:
177+
return None
149178
try:
150179
result = subprocess.run(
151180
[self._az_command, "account", "show", "--query", "tenantId", "-o", "tsv"],

src/apm_cli/install/pipeline.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,21 @@ def _trace(line: str) -> None:
157157
)
158158
_ctx_env = getattr(dep_ctx, "git_env", {}) or {}
159159
probe_env = {**os.environ, **_dl.git_env, **_ctx_env}
160-
if is_generic:
160+
# GIT_CONFIG_GLOBAL / GIT_CONFIG_NOSYSTEM carve-out: GitAuthEnvBuilder
161+
# forces an empty global gitconfig for ALL hosts to prevent a user's
162+
# ~/.gitconfig insteadOf rewrites or credential helpers from leaking
163+
# tokens during a clone. But for preflight probes (a single ls-remote
164+
# against the same host the dep targets), the redirection surface is
165+
# nil and killing the user's global config kills Git Credential
166+
# Manager along with it -- the helper most Windows ADO users rely on
167+
# for Entra-cached credentials. For ADO specifically that matters
168+
# because bearer acquisition can fail for reasons unrelated to login
169+
# state (sandbox, proxy, microsoft/apm#1430-style PATH quirks), and
170+
# GCM is the only remaining channel that can save us. Generic hosts
171+
# have the same logic; widening the carve-out to ADO keeps the
172+
# actual clone path isolated (it builds its own clean env) while
173+
# giving the preflight probe the best chance to succeed.
174+
if is_generic or is_azure_devops_hostname(host):
161175
for _key in ("GIT_CONFIG_GLOBAL", "GIT_CONFIG_NOSYSTEM", "GIT_ASKPASS"):
162176
probe_env.pop(_key, None)
163177

tests/unit/install/test_pipeline_auth_preflight.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -194,11 +194,17 @@ def test_generic_host_auth_failure_still_raises(self, mock_run):
194194
assert str(exc_info.value).startswith("Authentication failed for ghes.corp.example.com")
195195

196196
@patch("subprocess.run")
197-
def test_ado_host_retains_credential_blocking_env(self, mock_run):
198-
"""ADO hosts should retain GIT_ASKPASS (locked-down env with token in URL).
199-
200-
Generic hosts strip GIT_ASKPASS to allow credential helpers; ADO hosts
201-
keep it because auth is via token embedded in the URL.
197+
def test_ado_host_omits_credential_blocking_env(self, mock_run):
198+
"""ADO hosts strip GIT_CONFIG_GLOBAL / GIT_CONFIG_NOSYSTEM / GIT_ASKPASS
199+
from the preflight probe env so Git Credential Manager can answer
200+
for Entra-cached ADO creds when bearer acquisition is unavailable
201+
(microsoft/apm#1430 -- Windows az.cmd resolution failure, sandboxed
202+
runs, proxy issues, etc.).
203+
204+
The actual clone path remains isolated -- it builds its own clean
205+
env via GitAuthEnvBuilder.setup_environment(). This carve-out only
206+
widens the preflight PROBE leg so GCM gets a chance to authenticate
207+
before we surface a misleading "az not logged in" diagnostic.
202208
"""
203209
mock_run.return_value = MagicMock(returncode=0, stderr="", stdout="")
204210

@@ -219,9 +225,10 @@ def test_ado_host_retains_credential_blocking_env(self, mock_run):
219225
_preflight_auth_check(ctx, resolver, verbose=False)
220226

221227
call_env = mock_run.call_args[1]["env"]
222-
# ADO hosts keep the locked-down env since tokens are embedded in the URL
223-
assert call_env.get("GIT_CONFIG_NOSYSTEM") == "1"
224-
assert call_env.get("GIT_ASKPASS") == "echo"
228+
# Layer 2 of #1430 fix: ADO preflight no longer kills GCM.
229+
assert "GIT_CONFIG_NOSYSTEM" not in call_env
230+
assert "GIT_ASKPASS" not in call_env
231+
assert "GIT_CONFIG_GLOBAL" not in call_env
225232

226233

227234
# ---------------------------------------------------------------------------

tests/unit/test_azure_cli.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,152 @@ def test_is_available_when_az_missing(self):
3232
provider = AzureCliBearerProvider()
3333
assert provider.is_available() is False
3434

35+
def test_is_available_stable_after_construction(self):
36+
"""is_available reflects construction-time resolution only.
37+
38+
Once the provider is built, mid-process PATH changes (or shutil.which
39+
side effects) must NOT flip the answer. This is the contract that
40+
keeps is_available() consistent with get_bearer_token()'s pre-check
41+
and avoids a "True now / az_not_found a moment later" race.
42+
"""
43+
with patch("apm_cli.core.azure_cli.shutil.which", return_value="/usr/bin/az"):
44+
provider = AzureCliBearerProvider()
45+
# Re-enter a context where which() would now say "missing": the
46+
# already-constructed provider must still report True.
47+
with patch("apm_cli.core.azure_cli.shutil.which", return_value=None):
48+
assert provider.is_available() is True
49+
50+
51+
# ---------------------------------------------------------------------------
52+
# Windows az.cmd resolution (regression for microsoft/apm#1430)
53+
# ---------------------------------------------------------------------------
54+
55+
56+
class TestWindowsAzCmdResolution:
57+
"""Regression: subprocess.run must receive the shutil.which-resolved
58+
path, not the bare "az" token. On Windows the resolver returns
59+
"C:\\\\...\\\\az.cmd" and CreateProcessW cannot find "az" alone."""
60+
61+
WINDOWS_AZ_CMD = r"C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin\az.cmd"
62+
63+
def test_init_resolves_via_shutil_which(self):
64+
with patch(
65+
"apm_cli.core.azure_cli.shutil.which", return_value=self.WINDOWS_AZ_CMD
66+
) as mock_which:
67+
provider = AzureCliBearerProvider()
68+
mock_which.assert_called_once_with("az")
69+
assert provider._az_command == self.WINDOWS_AZ_CMD
70+
71+
def test_init_stores_none_when_not_on_path(self):
72+
with patch("apm_cli.core.azure_cli.shutil.which", return_value=None):
73+
provider = AzureCliBearerProvider()
74+
assert provider._az_command is None
75+
76+
def test_init_absolute_path_skips_resolution(self):
77+
"""Caller passed an explicit absolute path -- trust verbatim."""
78+
with patch("apm_cli.core.azure_cli.shutil.which") as mock_which:
79+
provider = AzureCliBearerProvider(az_command="/opt/custom/az")
80+
mock_which.assert_not_called()
81+
assert provider._az_command == "/opt/custom/az"
82+
83+
def test_init_relative_with_separator_still_resolves(self):
84+
"""A relative-with-separator token like 'subdir/az' must NOT bypass
85+
shutil.which; otherwise a caller could hand subprocess.run a
86+
CWD-relative path that the OS resolves against the wrong
87+
directory. Only absolute paths are trusted verbatim."""
88+
with patch("apm_cli.core.azure_cli.shutil.which", return_value="/usr/bin/az") as mock_which:
89+
provider = AzureCliBearerProvider(az_command="subdir/az")
90+
mock_which.assert_called_once_with("subdir/az")
91+
assert provider._az_command == "/usr/bin/az"
92+
93+
def test_get_bearer_token_invokes_resolved_az_cmd_path(self):
94+
"""The exact Windows shape: shutil.which returns az.cmd; verify it
95+
flows into subprocess.run as the argv[0]. Without the fix, argv[0]
96+
would be the bare "az" token and CreateProcessW would fail."""
97+
mock_result = MagicMock()
98+
mock_result.returncode = 0
99+
mock_result.stdout = FAKE_JWT + "\n"
100+
mock_result.stderr = ""
101+
102+
with (
103+
patch("apm_cli.core.azure_cli.shutil.which", return_value=self.WINDOWS_AZ_CMD),
104+
patch("apm_cli.core.azure_cli.subprocess.run", return_value=mock_result) as mock_run,
105+
):
106+
provider = AzureCliBearerProvider()
107+
provider.get_bearer_token()
108+
assert mock_run.call_count == 1
109+
cmd = mock_run.call_args[0][0]
110+
assert cmd[0] == self.WINDOWS_AZ_CMD, (
111+
f"subprocess.run must receive the resolved az.cmd path, got: {cmd[0]!r}"
112+
)
113+
114+
def test_bare_az_would_raise_filenotfound_but_resolved_path_succeeds(self):
115+
"""Explicit regression trap for the #1430 cascade.
116+
117+
Pre-fix: subprocess.run(["az", ...]) raised FileNotFoundError on
118+
Windows (CreateProcessW does not honor PATHEXT for az.cmd), which
119+
_run_get_access_token caught as AzureCliBearerError(kind=
120+
"subprocess_error"). The error propagated and was rendered as the
121+
misleading "az present but not logged in" Case 3 diagnostic.
122+
123+
Post-fix: the constructor resolves to the .cmd absolute path, so
124+
subprocess.run receives a path CreateProcessW CAN find and the
125+
bearer succeeds. This test pins both halves of the contract:
126+
bare 'az' -> FileNotFoundError; resolved path -> success.
127+
"""
128+
mock_result = MagicMock()
129+
mock_result.returncode = 0
130+
mock_result.stdout = FAKE_JWT + "\n"
131+
mock_result.stderr = ""
132+
133+
def fake_run(cmd, *args, **kwargs):
134+
if cmd[0] == "az":
135+
# Simulate the Windows CreateProcessW behavior pre-fix.
136+
raise FileNotFoundError(2, "No such file or directory", "az")
137+
return mock_result
138+
139+
with (
140+
patch("apm_cli.core.azure_cli.shutil.which", return_value=self.WINDOWS_AZ_CMD),
141+
patch("apm_cli.core.azure_cli.subprocess.run", side_effect=fake_run),
142+
):
143+
provider = AzureCliBearerProvider()
144+
# Would raise AzureCliBearerError(kind='subprocess_error') pre-fix.
145+
token = provider.get_bearer_token()
146+
assert token == FAKE_JWT
147+
148+
def test_get_current_tenant_id_invokes_resolved_az_cmd_path(self):
149+
"""Same regression for the get_current_tenant_id() probe -- this
150+
was the second swallowed-failure that drove the misleading Case 3
151+
diagnostic on Windows."""
152+
mock_result = MagicMock()
153+
mock_result.returncode = 0
154+
mock_result.stdout = "72f988bf-86f1-41af-91ab-2d7cd011db47\n"
155+
mock_result.stderr = ""
156+
157+
with (
158+
patch("apm_cli.core.azure_cli.shutil.which", return_value=self.WINDOWS_AZ_CMD),
159+
patch("apm_cli.core.azure_cli.subprocess.run", return_value=mock_result) as mock_run,
160+
):
161+
provider = AzureCliBearerProvider()
162+
provider.get_current_tenant_id()
163+
assert mock_run.call_count == 1
164+
cmd = mock_run.call_args[0][0]
165+
assert cmd[0] == self.WINDOWS_AZ_CMD
166+
167+
def test_get_current_tenant_id_returns_none_without_subprocess_when_az_missing(self):
168+
"""Explicit early-return guard: when az was not resolved at __init__,
169+
get_current_tenant_id() must short-circuit to None rather than
170+
passing None as argv[0] (which would TypeError and only be caught
171+
by the broad except). Mirrors get_bearer_token's is_available()
172+
pre-check."""
173+
with (
174+
patch("apm_cli.core.azure_cli.shutil.which", return_value=None),
175+
patch("apm_cli.core.azure_cli.subprocess.run") as mock_run,
176+
):
177+
provider = AzureCliBearerProvider()
178+
assert provider.get_current_tenant_id() is None
179+
mock_run.assert_not_called()
180+
35181

36182
# ---------------------------------------------------------------------------
37183
# get_bearer_token

0 commit comments

Comments
 (0)