Skip to content

feat(memory): multi-query expansion with error-safe recall#2592

Merged
theonlyhennygod merged 2 commits intomainfrom
issue-2472-enhanced-recall-safe
Mar 5, 2026
Merged

feat(memory): multi-query expansion with error-safe recall#2592
theonlyhennygod merged 2 commits intomainfrom
issue-2472-enhanced-recall-safe

Conversation

@theonlyhennygod
Copy link
Copy Markdown
Collaborator

@theonlyhennygod theonlyhennygod commented Mar 2, 2026

Summary

  • implement multi-query keyword expansion retrieval for long prompts
  • preserve fail-fast behavior for primary memory recall errors
  • keep keyword expansion recall best-effort so secondary query failure does not drop primary results
  • add regression tests for failure-mode behavior in retrieval + memory loader paths

Why

Validation

  • Previously validated on equivalent patch set before disk exhaustion:
    • cargo test --lib memory::retrieval::tests -- --nocapture
    • cargo test --lib agent::memory_loader::tests -- --nocapture
    • cargo test --lib agent::loop_::context::tests -- --nocapture
  • Current environment note: local disk reached No space left on device, preventing a final rerun in this worktree.

Closes #2472
Supersedes #2473

Summary by CodeRabbit

  • New Features

    • Enhanced memory retrieval system with multi-query keyword expansion for improved recall accuracy and coverage.
    • Implemented deduplication and scoring mechanisms for memory entry consolidation.
  • Tests

    • Added comprehensive test coverage for keyword extraction, entry merging, and error propagation scenarios.

@theonlyhennygod theonlyhennygod requested a review from chumyin as a code owner March 2, 2026 23:50
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 2, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
βš™οΈ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
πŸ“ Walkthrough

Walkthrough

Introduces multi-query keyword expansion for memory retrieval. A new enhanced_recall() function performs primary recall, extracts keywords from messages β‰₯30 characters, runs secondary recall if keywords differ, merges and deduplicates results by preserving highest scores, then returns top entries sorted by score. Integration updates two retrieval call sites.

Changes

Cohort / File(s) Summary
Memory Retrieval Enhancement
src/memory/retrieval.rs, src/memory/mod.rs
New module introducing enhanced_recall() function that wraps Memory::recall() with multi-query expansion. Extracts significant keywords (β‰₯4 characters) from queries β‰₯30 chars, performs secondary recall, merges results by deduplicating on key (highest score preserved), sorts by score descending, and returns top-N entries. Includes keyword extraction utility, merge logic, and comprehensive test suite covering keyword parsing, deduplication, and integration scenarios.
Retrieval Integration
src/agent/loop_/context.rs, src/agent/memory_loader.rs
Updated two retrieval call sites to use enhanced_recall() instead of direct mem.recall(). context.rs gates on early return if no entries found; memory_loader.rs adds new test helper FailingRecallMemory to verify error propagation from primary memory backend. Maintains downstream processing (time-decay, Core boost, min_relevance filtering, truncation, context emission).

Sequence Diagram(s)

sequenceDiagram
    actor Agent
    participant ER as enhanced_recall()
    participant Mem as Memory (Primary)
    participant KE as Keyword Extractor
    participant Mem2 as Memory (Secondary)
    participant Merger as Dedup & Sort

    Agent->>ER: enhanced_recall(query, limit)
    ER->>Mem: recall(full query)
    Mem-->>ER: primary_results
    
    alt query length >= 30 chars
        ER->>KE: extract_keywords(query)
        KE-->>ER: keyword string
        
        alt keywords differ from original
            ER->>Mem2: recall(keywords)
            Mem2-->>ER: secondary_results
            ER->>Merger: merge_entries(primary, secondary)
        else keywords same as original
            Merger-->>ER: primary_results
        end
    else short query
        Merger-->>ER: primary_results
    end
    
    Merger->>Merger: sort by score descending
    Merger->>Merger: truncate to limit
    Merger-->>ER: merged_results
    ER-->>Agent: sorted_entries
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Suggested labels

size: S, risk: medium, agent

Suggested reviewers

  • chumyin
πŸš₯ Pre-merge checks | βœ… 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description covers key sections (Summary, Why, Validation, linked issues) but omits most required template fields (Label Snapshot, Change Metadata, Security Impact, Compatibility, i18n, Human Verification, Side Effects, Rollback Plan, Risks). Complete the PR description using the repository template: add Label Snapshot (risk/size/scope/module labels), Change Metadata, Security Impact, Compatibility/Migration, Human Verification, Side Effects/Blast Radius, Rollback Plan, and Risks/Mitigations sections.
βœ… Passed checks (4 passed)
Check name Status Explanation
Title check βœ… Passed The title 'feat(memory): multi-query expansion with error-safe recall' accurately summarizes the main feature: multi-query keyword expansion for memory retrieval with error-safe behavior.
Linked Issues check βœ… Passed The code changes directly implement all requirements from #2472: enhanced_recall with primary/secondary queries, keyword extraction, deduplication, sorting, truncation, integration into both retrieval paths, and error-safe behavior.
Out of Scope Changes check βœ… Passed All changes align with #2472 scope: memory retrieval module, keyword expansion logic, integration points, and regression tests. No unrelated modifications detected.
Docstring Coverage βœ… Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • πŸ“ Generate docstrings (stacked PR)
  • πŸ“ Generate docstrings (commit on current branch)
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-2472-enhanced-recall-safe

Comment @coderabbitai help to get the list of available commands and usage tips.

@theonlyhennygod theonlyhennygod self-assigned this Mar 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 2, 2026

PR intake checks found warnings (non-blocking)

Fast safe checks found advisory issues. CI lint/test/build gates still enforce merge quality.

  • Missing required PR template sections: ## Validation Evidence (required), ## Security Impact (required), ## Privacy and Data Hygiene (required), ## Rollback Plan (required)
  • Incomplete required PR template fields: summary problem, summary why it matters, summary what changed, validation commands, security risk/mitigation, privacy status, rollback plan
  • Missing Linear issue key reference (RMN-<id>, CDV-<id>, or COM-<id>) in PR title/body (recommended for traceability, non-blocking).

Action items:

  1. Complete required PR template sections/fields.
  2. (Recommended) Link this PR to one active Linear issue key (RMN-xxx/CDV-xxx/COM-xxx) for traceability.
  3. Remove tabs, trailing whitespace, and merge conflict markers from added lines.
  4. Re-run local checks before pushing:
    • ./scripts/ci/rust_quality_gate.sh
    • ./scripts/ci/rust_strict_delta_gate.sh
    • ./scripts/ci/docs_quality_gate.sh

Detected Linear keys: none

Run logs: https://github.com/zeroclaw-labs/zeroclaw/actions/runs/22601223025

Detected blocking line issues (sample):

  • none

Detected advisory line issues (sample):

  • none

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 2, 2026

Thanks for contributing to ZeroClaw.

For faster review, please ensure:

  • PR template sections are fully completed
  • cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test are included
  • If automation/agents were used heavily, add brief workflow notes
  • Scope is focused (prefer one concern per PR)

See CONTRIBUTING.md and docs/pr-workflow.md for full collaboration rules.

@github-actions github-actions Bot added size: M Auto size: 251-500 non-doc changed lines. risk: medium Auto risk: src/** or dependency/config changes. distinguished contributor Contributor with 50+ merged PRs. memory: retrieval Auto module: memory/retrieval changed. and removed memory Auto scope: src/memory/** changed. labels Mar 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/memory/retrieval.rs (1)

28-30: Make secondary recall fallback observable

At Line 28, secondary recall failures are silently swallowed. Keeping primary results is correct, but logging this fallback will make production debugging safer.

πŸ’‘ Proposed improvement
-            if let Ok(extra) = mem.recall(&keywords, limit, session_id).await {
-                merge_entries(&mut results, extra);
-            }
+            match mem.recall(&keywords, limit, session_id).await {
+                Ok(extra) => merge_entries(&mut results, extra),
+                Err(err) => {
+                    tracing::debug!(
+                        error = %err,
+                        "keyword expansion recall failed; continuing with primary recall results"
+                    );
+                }
+            }

As per coding guidelines, "document fallback behavior when fallback is intentional and safe".

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/memory/retrieval.rs` around lines 28 - 30, The secondary recall call to
mem.recall(&keywords, limit, session_id).await currently ignores Err cases;
update the block around mem.recall so that failures are logged (including error
details and context like keywords/ session_id/limit) while still preserving
primary results and calling merge_entries(&mut results, extra) on Ok; reference
the mem.recall invocation and merge_entries/results to locate the change and use
the existing logging facility (or add one nearby) to record the fallback error.
πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/memory/retrieval.rs`:
- Around line 25-52: The length checks use byte counts instead of character
counts: replace usages of query.len() in the expansion check (the block that
compares to MIN_EXPANSION_LENGTH) and clean.len() inside fn extract_keywords to
use character counts (query.chars().count() and clean.chars().count()) so
Unicode characters (e.g., emoji, CJK) are measured correctly; keep the same
comparisons to MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH and preserve the
surrounding logic in extract_keywords and the recall/merge flow.

---

Nitpick comments:
In `@src/memory/retrieval.rs`:
- Around line 28-30: The secondary recall call to mem.recall(&keywords, limit,
session_id).await currently ignores Err cases; update the block around
mem.recall so that failures are logged (including error details and context like
keywords/ session_id/limit) while still preserving primary results and calling
merge_entries(&mut results, extra) on Ok; reference the mem.recall invocation
and merge_entries/results to locate the change and use the existing logging
facility (or add one nearby) to record the fallback error.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between b21a1a9 and d80b535.

πŸ“’ Files selected for processing (4)
  • src/agent/loop_/context.rs
  • src/agent/memory_loader.rs
  • src/memory/mod.rs
  • src/memory/retrieval.rs

Comment thread src/memory/retrieval.rs
Comment on lines +25 to +52
if query.len() >= MIN_EXPANSION_LENGTH {
let keywords = extract_keywords(query);
if !keywords.is_empty() && keywords != query.trim() {
if let Ok(extra) = mem.recall(&keywords, limit, session_id).await {
merge_entries(&mut results, extra);
}
}
}

// Sort by score descending, take top `limit`
results.sort_by(|a, b| {
b.score
.unwrap_or(0.0)
.partial_cmp(&a.score.unwrap_or(0.0))
.unwrap_or(std::cmp::Ordering::Equal)
});
results.truncate(limit);

Ok(results)
}

/// Extract significant keywords (length >= 4) from a message.
fn extract_keywords(msg: &str) -> String {
msg.split_whitespace()
.filter_map(|w| {
let clean = w.trim_matches(|c: char| !c.is_alphanumeric());
if clean.len() >= MIN_KEYWORD_LENGTH {
Some(clean)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the exact code at the flagged lines
head -60 src/memory/retrieval.rs | tail -40

Repository: zeroclaw-labs/zeroclaw

Length of output: 1244


🏁 Script executed:

# Find the constant definitions
rg -n 'MIN_EXPANSION_LENGTH|MIN_KEYWORD_LENGTH' src/memory/retrieval.rs -B2 -A2

Repository: zeroclaw-labs/zeroclaw

Length of output: 782


🏁 Script executed:

# Check if there are any Unicode/non-ASCII related tests
rg -n 'unicode|emoji|cjk|multibyte|non.?ascii|utf|δΈ­ζ–‡|ζ—₯本θͺž|ν•œκ΅­μ–΄|𝗨𝗻𝗢𝗰𝗼𝗱𝗲' src/memory/retrieval.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 48


🏁 Script executed:

# Look at the complete extract_keywords function
sed -n '47,60p' src/memory/retrieval.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 430


Use character-based thresholds instead of byte length

Constants MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH are documented as character-based, but lines 25 and 51 use len() which counts UTF-8 bytes. This causes incorrect behavior with non-ASCII input (e.g., emoji, CJK characters take 2–4 bytes per character but should count as 1).

Replace query.len() with query.chars().count() and clean.len() with clean.chars().count().

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/memory/retrieval.rs` around lines 25 - 52, The length checks use byte
counts instead of character counts: replace usages of query.len() in the
expansion check (the block that compares to MIN_EXPANSION_LENGTH) and
clean.len() inside fn extract_keywords to use character counts
(query.chars().count() and clean.chars().count()) so Unicode characters (e.g.,
emoji, CJK) are measured correctly; keep the same comparisons to
MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH and preserve the surrounding logic
in extract_keywords and the recall/merge flow.

@theonlyhennygod theonlyhennygod merged commit dc514cf into main Mar 5, 2026
28 of 31 checks passed
@theonlyhennygod theonlyhennygod deleted the issue-2472-enhanced-recall-safe branch March 5, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. distinguished contributor Contributor with 50+ merged PRs. memory: retrieval Auto module: memory/retrieval changed. risk: medium Auto risk: src/** or dependency/config changes. size: M Auto size: 251-500 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]:Multi-query keyword expansion for memory retrieval

1 participant