Skip to content

[Repo Assist] perf: replace O(N2) linear scan in dedupe_packages with O(N) HashMap#155

Draft
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/perf-dedupe-hashmap-2026-04-26-4f6f1c26ddcdbe72
Draft

[Repo Assist] perf: replace O(N2) linear scan in dedupe_packages with O(N) HashMap#155
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/perf-dedupe-hashmap-2026-04-26-4f6f1c26ddcdbe72

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant.

What

Replaces the O(N2) linear scan in dedupe_packages with an O(N) HashMap-based lookup.

Before:

fn dedupe_packages(packages: Vec<Package>) -> Vec<Package> {
    let mut deduped: Vec<Package> = Vec::new();
    for pkg in packages {
        if let Some(existing) = deduped.iter_mut().find(|current| {  // O(N) scan
            current.id == pkg.id && current.source.eq_ignore_ascii_case(&pkg.source)
        }) { ... }
    }
    deduped
}

After:

fn dedupe_packages(packages: Vec<Package>) -> Vec<Package> {
    let mut index: HashMap<(String, String), usize> = HashMap::new();  // O(1) lookup
    let mut deduped: Vec<Package> = Vec::new();
    for pkg in packages {
        let key = (pkg.id.clone(), pkg.source.to_ascii_lowercase());
        match index.get(&key) { ... }
    }
    deduped
}
```

## Why

`dedupe_packages` is called after every `winget list`, `winget search`, and `winget upgrade` result. The previous implementation scanned the accumulating output Vec for each input package, giving O(N2) string comparisons.

| Package count | Old comparisons | New |
|---------------|----------------|-----|
| 200 | ~20,000 | ~200 |
| 400 | ~80,000 | ~400 |
| 500 | ~125,000 | ~500 |

Users with large `winget list` outputs (common on developer machines) will see noticeably faster initial load.

## Behaviour preserved

- First occurrence of a `(id, source)` pair wins in insertion order.
- If a later entry is strictly better (`prefer_package`), it replaces the earlier one in-place, maintaining its position in the result.
- `source` comparison remains case-insensitive: normalised to `to_ascii_lowercase()` for the HashMap key, matching the previous `eq_ignore_ascii_case` guard.

## New tests

- `dedupe_packages_source_comparison_is_case_insensitive` — `"winget"` and `"Winget"` sources are treated as the same
- `dedupe_packages_preserves_insertion_order_of_unique_packages` — three distinct packages emerge in input order

## Test Status

```
cargo check --all-targets  → clean
cargo fmt -- --check       → clean
cargo clippy -- -D warnings → clean
cargo test                 → 232 passed (was 230)2 new tests

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Repo Assist · ● 5.2M ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@cbb46ab386962aa371045839fc9998ee4e97ca64

dedupe_packages previously iterated through the growing deduped vec for
every incoming package (iter_mut().find(...)), giving O(N²) complexity.

With a HashMap<(id, source_lowercase), index> keyed on the canonical
identity, duplicate lookup becomes O(1) amortised.

For a typical 'winget list' result with 400 packages this removes
~80k string comparisons; with 500 packages it removes ~125k.

Behaviour is identical:
- First occurrence of a (id, source) pair wins in insertion order.
- If a later entry is strictly better (prefer_package), it replaces the
  earlier one in-place (maintaining position in the output vec).
- source comparison remains case-insensitive: normalised to
  to_ascii_lowercase() for the HashMap key, matching the previous
  eq_ignore_ascii_case guard.

Two new tests added:
- dedupe_packages_source_comparison_is_case_insensitive
- dedupe_packages_preserves_insertion_order_of_unique_packages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants