feat(memory): multi-query expansion with error-safe recall by theonlyhennygod · Pull Request #2592 · zeroclaw-labs/zeroclaw

theonlyhennygod · 2026-03-02T23:50:14Z

Summary

implement multi-query keyword expansion retrieval for long prompts
preserve fail-fast behavior for primary memory recall errors
keep keyword expansion recall best-effort so secondary query failure does not drop primary results
add regression tests for failure-mode behavior in retrieval + memory loader paths

Why

resolves [Feature]:Multi-query keyword expansion for memory retrieval #2472 with explicit protection against silent primary-recall failures
avoids merge conflict churn by shipping one maintainer-owned branch to main

Validation

Previously validated on equivalent patch set before disk exhaustion:
- cargo test --lib memory::retrieval::tests -- --nocapture
- cargo test --lib agent::memory_loader::tests -- --nocapture
- cargo test --lib agent::loop_::context::tests -- --nocapture
Current environment note: local disk reached No space left on device, preventing a final rerun in this worktree.

Closes #2472
Supersedes #2473

Summary by CodeRabbit

New Features
- Enhanced memory retrieval system with multi-query keyword expansion for improved recall accuracy and coverage.
- Implemented deduplication and scoring mechanisms for memory entry consolidation.
Tests
- Added comprehensive test coverage for keyword extraction, entry merging, and error propagation scenarios.

coderabbitai · 2026-03-02T23:50:28Z

Note

`.coderabbit.yaml` has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)

Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

Introduces multi-query keyword expansion for memory retrieval. A new enhanced_recall() function performs primary recall, extracts keywords from messages ≥30 characters, runs secondary recall if keywords differ, merges and deduplicates results by preserving highest scores, then returns top entries sorted by score. Integration updates two retrieval call sites.

Changes

Cohort / File(s)	Summary
Memory Retrieval Enhancement `src/memory/retrieval.rs`, `src/memory/mod.rs`	New module introducing `enhanced_recall()` function that wraps `Memory::recall()` with multi-query expansion. Extracts significant keywords (≥4 characters) from queries ≥30 chars, performs secondary recall, merges results by deduplicating on key (highest score preserved), sorts by score descending, and returns top-N entries. Includes keyword extraction utility, merge logic, and comprehensive test suite covering keyword parsing, deduplication, and integration scenarios.
Retrieval Integration `src/agent/loop_/context.rs`, `src/agent/memory_loader.rs`	Updated two retrieval call sites to use `enhanced_recall()` instead of direct `mem.recall()`. `context.rs` gates on early return if no entries found; `memory_loader.rs` adds new test helper `FailingRecallMemory` to verify error propagation from primary memory backend. Maintains downstream processing (time-decay, Core boost, min_relevance filtering, truncation, context emission).

Sequence Diagram(s)

sequenceDiagram
    actor Agent
    participant ER as enhanced_recall()
    participant Mem as Memory (Primary)
    participant KE as Keyword Extractor
    participant Mem2 as Memory (Secondary)
    participant Merger as Dedup & Sort

    Agent->>ER: enhanced_recall(query, limit)
    ER->>Mem: recall(full query)
    Mem-->>ER: primary_results
    
    alt query length >= 30 chars
        ER->>KE: extract_keywords(query)
        KE-->>ER: keyword string
        
        alt keywords differ from original
            ER->>Mem2: recall(keywords)
            Mem2-->>ER: secondary_results
            ER->>Merger: merge_entries(primary, secondary)
        else keywords same as original
            Merger-->>ER: primary_results
        end
    else short query
        Merger-->>ER: primary_results
    end
    
    Merger->>Merger: sort by score descending
    Merger->>Merger: truncate to limit
    Merger-->>ER: merged_results
    ER-->>Agent: sorted_entries

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[Feature]: Core memory category boost in context retrieval #2378: Modifies memory retrieval in build_context() and DefaultMemoryLoader with Core-category score boost during post-retrieval ranking; this PR refactors those same call sites to use enhanced multi-query recall before downstream re-ranking.

Suggested labels

size: S, risk: medium, agent

Suggested reviewers

chumyin

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers key sections (Summary, Why, Validation, linked issues) but omits most required template fields (Label Snapshot, Change Metadata, Security Impact, Compatibility, i18n, Human Verification, Side Effects, Rollback Plan, Risks).	Complete the PR description using the repository template: add Label Snapshot (risk/size/scope/module labels), Change Metadata, Security Impact, Compatibility/Migration, Human Verification, Side Effects/Blast Radius, Rollback Plan, and Risks/Mitigations sections.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(memory): multi-query expansion with error-safe recall' accurately summarizes the main feature: multi-query keyword expansion for memory retrieval with error-safe behavior.
Linked Issues check	✅ Passed	The code changes directly implement all requirements from `#2472`: enhanced_recall with primary/secondary queries, keyword extraction, deduplication, sorting, truncation, integration into both retrieval paths, and error-safe behavior.
Out of Scope Changes check	✅ Passed	All changes align with `#2472` scope: memory retrieval module, keyword expansion logic, integration points, and regression tests. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch issue-2472-enhanced-recall-safe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-02T23:50:30Z

PR intake checks found warnings (non-blocking)

Fast safe checks found advisory issues. CI lint/test/build gates still enforce merge quality.

Missing required PR template sections: ## Validation Evidence (required), ## Security Impact (required), ## Privacy and Data Hygiene (required), ## Rollback Plan (required)
Incomplete required PR template fields: summary problem, summary why it matters, summary what changed, validation commands, security risk/mitigation, privacy status, rollback plan
Missing Linear issue key reference (RMN-<id>, CDV-<id>, or COM-<id>) in PR title/body (recommended for traceability, non-blocking).

Action items:

Complete required PR template sections/fields.
(Recommended) Link this PR to one active Linear issue key (RMN-xxx/CDV-xxx/COM-xxx) for traceability.
Remove tabs, trailing whitespace, and merge conflict markers from added lines.
Re-run local checks before pushing:
- ./scripts/ci/rust_quality_gate.sh
- ./scripts/ci/rust_strict_delta_gate.sh
- ./scripts/ci/docs_quality_gate.sh

Detected Linear keys: none

Run logs: https://github.com/zeroclaw-labs/zeroclaw/actions/runs/22601223025

Detected blocking line issues (sample):

none

Detected advisory line issues (sample):

none

github-actions · 2026-03-02T23:50:36Z

Thanks for contributing to ZeroClaw.

For faster review, please ensure:

PR template sections are fully completed
cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test are included
If automation/agents were used heavily, add brief workflow notes
Scope is focused (prefer one concern per PR)

See CONTRIBUTING.md and docs/pr-workflow.md for full collaboration rules.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/memory/retrieval.rs (1)

28-30: Make secondary recall fallback observable

At Line 28, secondary recall failures are silently swallowed. Keeping primary results is correct, but logging this fallback will make production debugging safer.

💡 Proposed improvement

-            if let Ok(extra) = mem.recall(&keywords, limit, session_id).await {
-                merge_entries(&mut results, extra);
-            }
+            match mem.recall(&keywords, limit, session_id).await {
+                Ok(extra) => merge_entries(&mut results, extra),
+                Err(err) => {
+                    tracing::debug!(
+                        error = %err,
+                        "keyword expansion recall failed; continuing with primary recall results"
+                    );
+                }
+            }

As per coding guidelines, "document fallback behavior when fallback is intentional and safe".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/memory/retrieval.rs` around lines 28 - 30, The secondary recall call to
mem.recall(&keywords, limit, session_id).await currently ignores Err cases;
update the block around mem.recall so that failures are logged (including error
details and context like keywords/ session_id/limit) while still preserving
primary results and calling merge_entries(&mut results, extra) on Ok; reference
the mem.recall invocation and merge_entries/results to locate the change and use
the existing logging facility (or add one nearby) to record the fallback error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/memory/retrieval.rs`:
- Around line 25-52: The length checks use byte counts instead of character
counts: replace usages of query.len() in the expansion check (the block that
compares to MIN_EXPANSION_LENGTH) and clean.len() inside fn extract_keywords to
use character counts (query.chars().count() and clean.chars().count()) so
Unicode characters (e.g., emoji, CJK) are measured correctly; keep the same
comparisons to MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH and preserve the
surrounding logic in extract_keywords and the recall/merge flow.

---

Nitpick comments:
In `@src/memory/retrieval.rs`:
- Around line 28-30: The secondary recall call to mem.recall(&keywords, limit,
session_id).await currently ignores Err cases; update the block around
mem.recall so that failures are logged (including error details and context like
keywords/ session_id/limit) while still preserving primary results and calling
merge_entries(&mut results, extra) on Ok; reference the mem.recall invocation
and merge_entries/results to locate the change and use the existing logging
facility (or add one nearby) to record the fallback error.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b21a1a9 and d80b535.

📒 Files selected for processing (4)

src/agent/loop_/context.rs
src/agent/memory_loader.rs
src/memory/mod.rs
src/memory/retrieval.rs

coderabbitai · 2026-03-02T23:55:36Z

+    if query.len() >= MIN_EXPANSION_LENGTH {
+        let keywords = extract_keywords(query);
+        if !keywords.is_empty() && keywords != query.trim() {
+            if let Ok(extra) = mem.recall(&keywords, limit, session_id).await {
+                merge_entries(&mut results, extra);
+            }
+        }
+    }
+
+    // Sort by score descending, take top `limit`
+    results.sort_by(|a, b| {
+        b.score
+            .unwrap_or(0.0)
+            .partial_cmp(&a.score.unwrap_or(0.0))
+            .unwrap_or(std::cmp::Ordering::Equal)
+    });
+    results.truncate(limit);
+
+    Ok(results)
+}
+
+/// Extract significant keywords (length >= 4) from a message.
+fn extract_keywords(msg: &str) -> String {
+    msg.split_whitespace()
+        .filter_map(|w| {
+            let clean = w.trim_matches(|c: char| !c.is_alphanumeric());
+            if clean.len() >= MIN_KEYWORD_LENGTH {
+                Some(clean)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the exact code at the flagged lines head -60 src/memory/retrieval.rs | tail -40

Repository: zeroclaw-labs/zeroclaw

Length of output: 1244

🏁 Script executed:

# Find the constant definitions rg -n 'MIN_EXPANSION_LENGTH|MIN_KEYWORD_LENGTH' src/memory/retrieval.rs -B2 -A2

Repository: zeroclaw-labs/zeroclaw

Length of output: 782

🏁 Script executed:

# Check if there are any Unicode/non-ASCII related tests rg -n 'unicode|emoji|cjk|multibyte|non.?ascii|utf|中文|日本語|한국어|𝗨𝗻𝗶𝗰𝗼𝗱𝗲' src/memory/retrieval.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 48

🏁 Script executed:

# Look at the complete extract_keywords function sed -n '47,60p' src/memory/retrieval.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 430

Use character-based thresholds instead of byte length

Constants MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH are documented as character-based, but lines 25 and 51 use len() which counts UTF-8 bytes. This causes incorrect behavior with non-ASCII input (e.g., emoji, CJK characters take 2–4 bytes per character but should count as 1).

Replace query.len() with query.chars().count() and clean.len() with clean.chars().count().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/memory/retrieval.rs` around lines 25 - 52, The length checks use byte counts instead of character counts: replace usages of query.len() in the expansion check (the block that compares to MIN_EXPANSION_LENGTH) and clean.len() inside fn extract_keywords to use character counts (query.chars().count() and clean.chars().count()) so Unicode characters (e.g., emoji, CJK) are measured correctly; keep the same comparisons to MIN_EXPANSION_LENGTH and MIN_KEYWORD_LENGTH and preserve the surrounding logic in extract_keywords and the recall/merge flow.

theonlyhennygod added 2 commits March 2, 2026 17:53

feat(memory): add multi-query keyword expansion

ff8017a

fix(memory): propagate primary recall errors in enhanced retrieval

d80b535

theonlyhennygod requested a review from chumyin as a code owner March 2, 2026 23:50

theonlyhennygod self-assigned this Mar 2, 2026

theonlyhennygod mentioned this pull request Mar 2, 2026

[Feature]:Multi-query keyword expansion for memory retrieval #2472

Closed

2 tasks

github-actions Bot added agent Auto scope: src/agent/** changed. memory Auto scope: src/memory/** changed. labels Mar 2, 2026

theonlyhennygod mentioned this pull request Mar 2, 2026

feat(memory): add multi-query keyword expansion for enhanced recall(issue-2472) #2473

Closed

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

This was referenced Mar 3, 2026

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#46

Closed

🦞 OpenClaw 生态日报 2026-03-03 rollysys/agents-radar#28

Open

theonlyhennygod merged commit dc514cf into main Mar 5, 2026
28 of 31 checks passed

theonlyhennygod deleted the issue-2472-enhanced-recall-safe branch March 5, 2026 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): multi-query expansion with error-safe recall#2592

feat(memory): multi-query expansion with error-safe recall#2592
theonlyhennygod merged 2 commits intomainfrom
issue-2472-enhanced-recall-safe

theonlyhennygod commented Mar 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 2, 2026 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Mar 2, 2026

Uh oh!

github-actions Bot commented Mar 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theonlyhennygod commented Mar 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

.coderabbit.yaml has unrecognized properties

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Mar 2, 2026

PR intake checks found warnings (non-blocking)

Uh oh!

github-actions Bot commented Mar 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

theonlyhennygod commented Mar 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 2, 2026 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties