Skip to content

test(codeql): fix 30 py/incomplete-url-substring-sanitization alerts#1492

Merged
danielmeppiel merged 1 commit into
mainfrom
danielmeppiel/fix-codeql-url-substring-alerts
May 26, 2026
Merged

test(codeql): fix 30 py/incomplete-url-substring-sanitization alerts#1492
danielmeppiel merged 1 commit into
mainfrom
danielmeppiel/fix-codeql-url-substring-alerts

Conversation

@danielmeppiel

Copy link
Copy Markdown
Collaborator

Summary

Fixes all 30 open CodeQL py/incomplete-url-substring-sanitization (high-severity) alerts across the test suite by replacing substring assertions on hostname-shaped literals with parsed-component checks, per the canonical test rule in .github/instructions/tests.instructions.md.

Alerts addressed: #127#156 (each in both the canonical and _phase3 test mirror).

Why

Substring assertions like assert "host.example.com" in msg or
assert "https://x" in url are flagged by CodeQL as
py/incomplete-url-substring-sanitization (high severity, "the string
may be at an arbitrary position in the URL") and will fail CI.
.github/instructions/tests.instructions.md

What changed

Three substitution patterns, applied test-by-test based on what the assertion really targets:

Original assertion shape Replacement
Hostname in a URL string urllib.parse.urlparse(url).hostname == "host"
Hostname embedded in error/log text (no scheme) re.search(r"\bhost\.tld\b", text) (word-boundary regex)
Slug-style canonical form (host/owner/repo) canonical.split("/", 1)[0] == "host"

For markdown / multi-URL output, URLs are extracted with a regex and asserted with hostname equality on the parsed token — mirroring the _printed_urls helper pattern already used in tests/unit/test_mcp_command.py.

Files (19)

Each touched file has a canonical / _phase3 mirror, so changes are paired:

  • tests/unit/test_github_host.py
  • tests/unit/install/test_validation_phase3.py + test_validation_error_handling.py
  • tests/unit/install/test_pipeline_phase3.py + test_pipeline_preflight.py
  • tests/unit/compilation/test_link_resolver_phase3.py + test_link_resolver_resolution.py
  • tests/unit/deps/test_download_strategies_phase3.py + test_download_strategies_selection.py (5 alerts each)
  • tests/integration/test_uninstall_policy_pack_phase3.py + test_uninstall_policy_pack_flow.py
  • tests/integration/test_runner_reference_phase3.py + test_runner_reference_resolution.py (3 alerts each)
  • tests/integration/test_remaining_modules_phase3.py + test_remaining_modules_coverage.py
  • tests/integration/test_marketplace_publisher_phase3.py + test_marketplace_publisher_flow.py
  • tests/integration/test_integrators_validation_phase3b.py + test_integrators_validation_rules.py

Validation

  • uv run --extra dev ruff check src/ tests/ — clean
  • uv run --extra dev ruff format --check src/ tests/ — clean
  • uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/ — clean
  • bash scripts/lint-auth-signals.sh — clean
  • pytest on all 19 touched files — 1951 passed

Scope notes

  • Test-only changes; no production behaviour modified.
  • Semantics preserved: each assertion still checks the same hostname / URL component, just via parsed equality rather than substring.

Not in scope

One open alert is not included:

  • docs: fix CLI reference drift from consistency reports #140 and #152 #160 actions/untrusted-checkout/high in .github/workflows/daily-test-improver.lock.yml. This file is auto-generated by gh-aw (compiler_version: v0.76.1); manual edits would be overwritten on next compile. The flagged checkout step is guarded by an if excluding issue_comment / pull_request_review_comment events, but CodeQL flags the workflow holistically. Suggested follow-up: dismiss as "won't fix" with that explanation, or upgrade gh-aw once an upstream fix lands.

CodeQL py/incomplete-url-substring-sanitization fires on substring
assertions against hostname-shaped literals (e.g. assert "github.com"
in error_msg). Per .github/instructions/tests.instructions.md, replace
with:

* urllib.parse.urlparse(...).hostname == "host" for URL strings
* re.search(r"\bhost\.tld\b", text) word-boundary regex for
  hostnames embedded in error/log text (not URLs)
* str.partition("/")[0] == "host" for slug-style canonical forms

Fixes 30 high-severity CodeQL alerts (#127-#156) across canonical and
phase3 test mirrors. Semantics preserved; all 1951 touched tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 26, 2026 21:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces 30 substring-based hostname assertions across 19 test files with parsed-component checks (using urllib.parse.urlparse, word-boundary regex, or string partitioning) to silence CodeQL py/incomplete-url-substring-sanitization alerts. Test semantics are preserved; no production code is touched.

Changes:

  • Substitute assert "host" in url with urlparse(url).hostname == "host" for URL-shaped strings.
  • Substitute assert "host.tld" in text with re.search(r"\bhost\.tld\b", text) for error/log text containing bare hostnames.
  • For canonical/slug forms like host/owner/repo, assert on the first path segment via split("/", 1)[0].
Show a summary per file
File Description
tests/unit/test_github_host.py Word-boundary regex for github.com in unsupported-host error.
tests/unit/install/test_validation_{phase3,error_handling}.py Regex \bexample\.com\b for TLS verbose-log assertion.
tests/unit/install/test_pipeline_{phase3,preflight}.py Regex \bdev\.azure\.com\b for auth error message.
tests/unit/compilation/test_link_resolver_{phase3,resolution}.py Extract URLs with regex, assert hostname via urlparse.
tests/unit/deps/test_download_strategies_{phase3,selection}.py Module-level urlparse import; hostname equality on built API URLs.
tests/integration/test_uninstall_policy_pack_{phase3,flow}.py First-segment partition on repo ref to validate host prefix.
tests/integration/test_runner_reference_{phase3,resolution}.py First-segment split on canonical/str forms to verify non-default host.
tests/integration/test_remaining_modules_{phase3,coverage}.py URL extraction + hostname equality for preserved external link.
tests/integration/test_marketplace_publisher_{phase3,flow}.py Word-boundary regex on supported-host guidance string.
tests/integration/test_integrators_validation_{phase3b,rules}.py Word-boundary regex on TLS verbose-log host check.

Copilot's findings

  • Files reviewed: 19/19 changed files
  • Comments generated: 0

@danielmeppiel danielmeppiel merged commit 3e8dcdc into main May 26, 2026
21 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/fix-codeql-url-substring-alerts branch May 26, 2026 21:59
@danielmeppiel danielmeppiel mentioned this pull request May 26, 2026
danielmeppiel added a commit that referenced this pull request May 26, 2026
* chore: cut 0.15.0

Move Unreleased -> [0.15.0] - 2026-05-27 and bump pyproject + uv.lock.

Audit applied: every PR merged since v0.14.2 has exactly one
changelog entry; each entry leads with the user-visible impact.

Fixes during audit:
- Add missing entries for #1367, #1403, #1465, #1487, #1492, #1462,
  #1477, #1439, #1484, and the 131679f follow-up commit.
- Collapse the two #1473 lines into one.
- Merge the #1476 Security/GitCache-hardening entry into its Added
  entry (same PR, one logical change).
- Replace bogus #1243 PR ref with the actual merge PR #1308 for the
  persisted transport-flag config.
- Relocate the #1324-delivered marketplace CLI entries (apm pack
  --marketplace / --marketplace-path / --json, outputs map form)
  out of Unreleased and into [0.14.2], where they actually shipped.
  They were mis-attributed to #1317 and orphaned across the 0.14.2
  cut.

Verified locally: ruff check + ruff format --check both clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants