Skip to content

fix(session-search): strip FTS5 operators from truncation query terms#18692

Open
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/session-search-fts5-operators
Open

fix(session-search): strip FTS5 operators from truncation query terms#18692
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/session-search-fts5-operators

Conversation

@liuhao1024
Copy link
Copy Markdown
Contributor

Problem

_truncate_around_matches() treats FTS5 boolean operators (AND, OR, NOT) as content terms when splitting the query. Since common English words like "and" appear in virtually every conversation, this produces false match positions and mis-centered truncation windows.

Similarly, FTS5 syntax like NEAR(...), column filters (role:user), and special characters (+, *, ^) pollute the search terms.

Fix

Add _strip_fts5_operators() helper that extracts plain content terms from an FTS5 query by removing:

  • Boolean operators (AND, OR, NOT) — case-insensitive, word-boundary aware
  • NEAR(...) clauses
  • Column filters (e.g., role:user)
  • FTS5 special characters (+, {}, (), ^, ~, *)
  • Quoted-phrase delimiters (content preserved, quotes removed)

The cleaned query is used for all three matching strategies:

  1. Full-phrase search
  2. Proximity co-occurrence (within 200 chars)
  3. Individual term fallback

If stripping removes everything (query was purely operators), falls back to the raw query.

Tests

  • 14 new tests for _strip_fts5_operators() covering all operator types
  • 4 new integration tests for _truncate_around_matches() with FTS5 queries
  • All 25 existing + new tests pass

Fixes #4238, Fixes #4239

When _truncate_around_matches receives an FTS5 query with boolean
operators (AND, OR, NOT), those operators were split into individual
terms and searched for in the conversation text.  Since common English
words like 'and' appear everywhere, this produced noisy match positions
and mis-centered truncation windows.

Add _strip_fts5_operators() helper that removes:
- Boolean operators (AND, OR, NOT) — case-insensitive
- NEAR(...) clauses
- Column filters (e.g. role:user)
- FTS5 special characters (+, {}, (), ^, ~, *)
- Quoted-phrase delimiters (content preserved)

The cleaned query is used for all three matching strategies (phrase,
proximity co-occurrence, individual terms).

Fixes NousResearch#4238, NousResearch#4239
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/tools Tool registry, model_tools, toolsets labels May 2, 2026
@alt-glitch
Copy link
Copy Markdown
Collaborator

Note: This PR appears to be a superset of #18690 (same fix scope + more). Both target #4238; this one also covers #4239 with NEAR(), column filters, and special char stripping.

Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 3, 2026
Community PRs applied:
- NousResearch#18596: Enable secret redaction by default (SECURITY)
- NousResearch#18650: Sanitize malformed tool messages + auto-recover on API 400
- NousResearch#18607: Emergency compression before max_iterations exhaustion
- NousResearch#18603: Compression fallback to main model on 413 rate limit
- NousResearch#18638: Pass threshold_percent on model switch
- NousResearch#18663: Strip extra_content from tool_calls for strict APIs
- NousResearch#18618: Forward explicit_api_key to OpenRouter
- NousResearch#18632: Show cache tokens in /insights breakdown
- NousResearch#18614: Add idempotency guard for patch duplicate loops
- NousResearch#18600: Raise ValueError when HERMES_HOME unset in profile mode
- NousResearch#18616: Allow ZWJ emoji in context files
- NousResearch#18582: Reload .env on /restart
- NousResearch#18547: Stabilize system prompt prefix for KV cache reuse
- NousResearch#18692: Strip FTS5 operators from session search truncation terms

Fix: Add order_by_last_active=True to list_sessions_rich call
(pre-existing commit 142b4bf code sync)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

2 participants