Skip to content

fix: avoid duplicate sharded directory scans#1229

Closed
jessejam wants to merge 1 commit intopromptfoo:mainfrom
jessejam:fix/sharded-directory-dedup
Closed

fix: avoid duplicate sharded directory scans#1229
jessejam wants to merge 1 commit intopromptfoo:mainfrom
jessejam:fix/sharded-directory-dedup

Conversation

@jessejam
Copy link
Copy Markdown
Contributor

@jessejam jessejam commented May 8, 2026

Summary

  • Deduplicate sharded model families during directory scans so a family is scanned once instead of once per shard.
  • Preserve per-shard file counts, assets, and file metadata while counting family bytes only once.
  • Keep shard-family cache entries valid by fingerprinting every present shard path/content hash.
  • Keep aggregate content_hash limited to scan entries that actually ran when scans stop early.
  • Add regression coverage for complete, incomplete, cache-sensitive, and early-stopped sharded/directory scans.

Validation

  • PROMPTFOO_DISABLE_TELEMETRY=1 ./.venv/bin/python -m pytest tests/test_core.py -k "sharded or content_hash_excludes"
  • ./.venv/bin/ruff format --check modelaudit/core.py tests/test_core.py
  • ./.venv/bin/ruff check modelaudit/core.py tests/test_core.py
  • ./.venv/bin/mypy modelaudit/core.py tests/test_core.py
  • git diff --check

Note: local validation used the existing project virtualenv because uv is not installed in this environment.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6c8dbdc9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py Outdated
@jessejam jessejam force-pushed the fix/sharded-directory-dedup branch from f6c8dbd to ce976b1 Compare May 8, 2026 08:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce976b1e19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py Outdated
Copy link
Copy Markdown
Contributor

Thanks for the fix here. I verified the duplicate-scan issue is real and carried the change forward into #1231 so we could add one follow-up before merge: grouped shard families still need path-correct per-shard metadata/assets after deduping the scan work. Closing this in favor of the maintainer follow-up because this branch is not maintainer-editable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants