perf: Optimize response_cache_by_prompt with inverted index#2321
Merged
crivetimihai merged 4 commits intomainfrom Jan 25, 2026
Merged
perf: Optimize response_cache_by_prompt with inverted index#2321crivetimihai merged 4 commits intomainfrom
crivetimihai merged 4 commits intomainfrom
Conversation
Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
Add comprehensive test coverage for the inverted index optimization: - Tokenization and vectorization functions - Basic cache store and hit functionality - Inverted index population and candidate filtering - Eviction and index rebuild scenarios - Max entries cap with index consistency Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
e22fd14 to
66e732c
Compare
crivetimihai
approved these changes
Jan 25, 2026
kcostell06
pushed a commit
to kcostell06/mcp-context-forge
that referenced
this pull request
Feb 24, 2026
* optimize response_cache_by_prompt lookup with inverted index Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * fix type hint Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * flake8 fixes Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * test: add unit tests for response_cache_by_prompt inverted index Add comprehensive test coverage for the inverted index optimization: - Tokenization and vectorization functions - Basic cache store and hit functionality - Inverted index population and candidate filtering - Eviction and index rebuild scenarios - Max entries cap with index consistency Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Shoumi <shoumimukherjee@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Performance Improvement PR
📌 Summary
Optimizes
response_cache_by_promptplugin lookup performance from O(n) to O(k·m) by implementing an inverted index, achieving sublinear scaling with cache size.Complexity Breakdown:
Key Achievement: While the precise complexity is O(k·m), this delivers sublinear scaling because m grows much slower than n. A 10x cache increase might only cause a 2-3x increase in m.
🔗 Related Issue
Closes: #1835
📈 Root Cause
The plugin performed a linear scan comparing cosine similarity against all cached entries on every lookup. As cache size grew, CPU usage and latency increased linearly, dominating request processing time.
Evidence:
_find_best()vectorized input and computed cosine similarity against all n cache entries per request (O(n) complexity).🔧 Solution
Implemented an inverted index (token → entry indices mapping) to filter candidates before computing expensive cosine similarity:
_Entrynow stores its token set for efficient index updatestool → token → Set[int]for O(1) candidate lookup per tokenCritical Bug Fixed
The eviction logic was storing old bucket indices
(i, e)and then rebuilding the bucket, causing index-to-entry misalignment. Fixed by removing index tracking and rebuilding from scratch:📊 Performance Impact
Complexity Analysis:
Why This Achieves Sublinear Scaling:
With good token distribution, m grows much slower than n:
Example Calculation:
Expected Improvement:
📄 Changes
Modified:
plugins/response_cache_by_prompt/response_cache_by_prompt.py_index: Dict[str, Dict[str, Set[int]]]inverted index data structuretokens: set[str]field to_Entrydataclass for efficient lookup_find_best()to use index-based candidate filtering instead of linear scantool_post_invoke()with proper index maintenance and rebuild logic✅ Acceptance Criteria