Skip to content

test: address AI findings in recent test changes#1234

Merged
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/fix-ai-findings-20260508
May 9, 2026
Merged

test: address AI findings in recent test changes#1234
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/fix-ai-findings-20260508

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • clarify several recent test helpers and thresholds surfaced by GitHub AI findings
  • make the NumPy v2 coverage test write and assert a real v2 .npy file
  • make large-file fallback coverage use realistic scanner fakes while preserving the intended fail-closed behavior

Validation

  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache /opt/homebrew/bin/uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache /opt/homebrew/bin/uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache /opt/homebrew/bin/uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 /opt/homebrew/bin/uv run pytest -n auto -m "not slow and not integration" --maxfail=1

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 622.05ms -> 627.02ms (+0.8%).

Workload Benchmark Target Size Files Baseline Current Change Status
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 72.92ms 78.99ms +8.3% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 16.62ms 15.48ms -6.8% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 133.9us 131.2us -2.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 18.74ms 18.40ms -1.8% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 460.5us 466.5us +1.3% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 35.85ms 35.50ms -1.0% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 136.7us 135.4us -0.9% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 250.17ms 251.59ms +0.6% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 402.7us 400.9us -0.4% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 190.05ms 189.35ms -0.4% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 126.8us 126.3us -0.3% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 36.45ms 36.44ms -0.0% stable

@mldangelo-oai mldangelo-oai merged commit 5aa48f2 into main May 9, 2026
30 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/fix-ai-findings-20260508 branch May 9, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant