Skip to content

Fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex#290

Merged
ilblackdragon merged 3 commits intonearai:mainfrom
nlok5923:feat/llm-response-cache-170
Mar 6, 2026
Merged

Fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex#290
ilblackdragon merged 3 commits intonearai:mainfrom
nlok5923:feat/llm-response-cache-170

Conversation

@nlok5923
Copy link
Copy Markdown
Contributor

@nlok5923 nlok5923 commented Feb 21, 2026

Summary

Completes the LLM response cache (CachedProvider) for issue #170. The
decorator and wiring were already in place; three acceptance criteria were missing:

  • set_model() relies on per-model cache key isolation. Cache keys embed
    active_model_name() via effective_model_name(), so switching models
    automatically routes requests to a separate cache slot. Entries for the old
    model are preserved — if you switch back, they will be hit again rather than
    wasted. Natural TTL and LRU eviction clean up stale entries. No explicit
    cache clear is performed on model switch.

  • No periodic stats logging. complete() now emits a tracing::info!
    line every 100 requests with total_requests, total_hits,
    hit_rate_pct, and entry_count. Stats are emitted even when the inner
    provider returns an error so that milestone intervals are never silently
    skipped.

  • Missing set_model() tests. Two new tests cover the success path
    (per-model key isolation, cache preserved and reused on switch-back) and
    the failure path (cache preserved on error).

Structural change

Switched the cache lock from tokio::sync::Mutex to std::sync::Mutex.
The lock is never held across an .await — both scopes in complete()
contain only CPU-bound HashMap operations with the real .await in between.
This lets set_model() (a synchronous fn in the LlmProvider trait)
clear the cache directly, and makes len(), is_empty(), total_hits(),
and clear() synchronous.

Added a total_hit_count: AtomicU64 field that is incremented on every cache
hit and never decremented on eviction. maybe_log_stats uses this counter
instead of summing per-entry hit_count values, so hit_rate_pct stays
accurate under high eviction pressure.

Test plan

  • cargo test response_cache — 20/20 pass, 0 warnings
  • cargo clippy --all --benches --tests --examples --all-features — clean
  • RESPONSE_CACHE_ENABLED=true wires CachedProvider in build_provider_chain() (src/llm/mod.rs:415)
  • Cache disabled by default — parse_optional_env("RESPONSE_CACHE_ENABLED", false) (src/config/llm.rs:237)

@github-actions github-actions Bot added scope: llm LLM integration size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules labels Feb 21, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @nlok5923, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request completes the LLM response cache implementation by addressing critical missing functionalities and improving its concurrency model. The primary goal was to ensure cache integrity and provide better operational visibility. By switching to a standard library mutex and implementing explicit cache invalidation on model changes, the system becomes more robust. Additionally, periodic logging of cache statistics offers valuable insights into its performance and effectiveness, enhancing the overall reliability and observability of the LLM provider.

Highlights

  • Concurrency Model Change: The cache lock was switched from tokio::sync::Mutex to std::sync::Mutex as the lock is never held across an .await point, allowing synchronous cache operations.
  • Cache Invalidation on Model Switch: The set_model() method now clears the response cache upon successful model change, as cached entries become stale. If the model switch fails, the cache remains untouched.
  • Periodic Cache Statistics Logging: The complete() method now logs cache statistics (total requests, hits, hit rate, entry count) every 100 requests using tracing::info!.
  • Synchronous Cache Operations: Methods like len(), is_empty(), total_hits(), and clear() on CachedProvider are now synchronous, reflecting the std::sync::Mutex usage.
  • New Test Coverage: Added new tests to specifically cover the cache clearing behavior of set_model() for both success and failure scenarios.
Changelog
  • src/llm/response_cache.rs
    • Replaced tokio::sync::Mutex with std::sync::Mutex for cache locking.
    • Introduced AtomicU64 for tracking total request count for statistics.
    • Added STATS_LOG_EVERY_N constant to control the frequency of stats logging.
    • Converted len(), is_empty(), total_hits(), and clear() methods to be synchronous.
    • Implemented periodic logging of cache hit/miss statistics within the complete() method.
    • Modified set_model() to clear the cache if the underlying model change is successful.
    • Added a new SwitchableStub test helper to simulate model switching.
    • Updated existing tests to reflect the synchronous nature of cache access methods.
    • Introduced new tests to verify cache invalidation on set_model() success and non-invalidation on failure.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the contributor: regular 2-5 merged PRs label Feb 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively completes the implementation of the LLM response cache by adding cache invalidation on model changes, periodic statistics logging, and corresponding tests. The structural change from tokio::sync::Mutex to std::sync::Mutex is a sensible simplification given that the lock is not held across await points. The implementation is solid, but I have a couple of suggestions for improving maintainability and caching efficiency.

Comment thread src/llm/response_cache.rs Outdated
Comment on lines +233 to +243
if req_no.is_multiple_of(STATS_LOG_EVERY_N) {
let total_hits: u64 = guard.values().map(|e| e.hit_count).sum();
let hit_rate = total_hits as f64 / req_no as f64 * 100.0;
tracing::info!(
total_requests = req_no,
total_hits,
hit_rate_pct = format!("{hit_rate:.1}"),
entry_count = guard.len(),
"LLM response cache statistics"
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for logging statistics is duplicated from the cache-hit path (lines 179-191). To improve maintainability and avoid future inconsistencies, consider extracting this logic into a private helper function on CachedProvider.

For example, you could add a method like this:

fn log_stats(&self, guard: &std::sync::MutexGuard<HashMap<String, CacheEntry>>, req_no: u64) {
    let total_hits: u64 = guard.values().map(|e| e.hit_count).sum();
    let hit_rate = total_hits as f64 / req_no as f64 * 100.0;
    tracing::info!(
        total_requests = req_no,
        total_hits,
        hit_rate_pct = format!("{hit_rate:.1}"),
        entry_count = guard.len(),
        "LLM response cache statistics"
    );
}

And then call it from both the cache-hit and cache-miss paths inside the if req_no.is_multiple_of(...) block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a private maybe_log_stats(guard, req_no) helper that both the cache-hit and cache-miss paths now call, eliminating the ~10-line duplicated block.

Comment thread src/llm/response_cache.rs
Comment on lines 269 to 277
fn set_model(&self, model: &str) -> Result<(), LlmError> {
self.inner.set_model(model)
let result = self.inner.set_model(model);
if result.is_ok() {
// All cached keys embed the previous model name, so they are now stale.
self.cache.lock().unwrap_or_else(|e| e.into_inner()).clear();
tracing::debug!(model, "LLM response cache cleared after model switch");
}
result
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation correctly invalidates the cache upon a model switch. This is necessary because the cache key is currently generated using model_name().

However, this approach clears the entire cache, losing all cached responses, which could be inefficient if you switch back and forth between models.

An alternative approach would be to use active_model_name() when generating the cache key in complete(). This would create distinct cache entries for each model, allowing the cache to hold responses for multiple models simultaneously.

This would involve two changes:

  1. In complete():

    let key = cache_key(&self.inner.active_model_name(), &request);
  2. In set_model(), you would no longer need to clear the cache:

    fn set_model(&self, model: &str) -> Result<(), LlmError> {
        self.inner.set_model(model)
    }

This would make the caching more robust to model switching. What are your thoughts on this alternative design?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the cache.clear() from set_model(). The cache key already embeds active_model_name() via
effective_model_name(), so switching models automatically routes requests to a different key. Old entries for the previous model survive — if you switch back, they'll be hit again rather than wasted. TTL/LRU handles cleanup naturally.

@nlok5923 nlok5923 force-pushed the feat/llm-response-cache-170 branch from b8a6fce to 9cff711 Compare February 21, 2026 12:49
@github-actions github-actions Bot added the scope: channel/cli TUI / CLI channel label Feb 21, 2026
@nlok5923 nlok5923 force-pushed the feat/llm-response-cache-170 branch from 9cff711 to a19f5d6 Compare February 21, 2026 12:55
…ging, sync mutex

# Conflicts:
#	src/llm/response_cache.rs
@nlok5923 nlok5923 force-pushed the feat/llm-response-cache-170 branch from a19f5d6 to a6e9277 Compare February 21, 2026 12:58
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

A well-structured PR that adds stats logging and switches to std::sync::Mutex for the LLM response cache. The code is correct and the mutex invariant (never held across .await) is properly maintained. The main issue is a significant discrepancy between the PR description (which claims set_model clears the cache) and the implementation (which preserves cache entries and relies on per-model key isolation). The implementation approach is actually better than the described one, but the description should be corrected. The stats logging has a minor accuracy issue — evicted entries' hit counts are lost, causing the reported hit rate to undercount over time. A separate atomic counter for total hits would fix this. Test coverage for the new set_model behavior is good, but the stats logging feature has no tests at all.

General findings

[MEDIUM] PR description contradicts implementation: set_model does NOT clear the cache (src/llm/response_cache.rs:271)

The PR body states: "set_model() now clears the cache on success. If the inner provider rejects the switch, the cache is left untouched." However, the implementation at line 271-276 simply delegates to self.inner.set_model(model) without touching the cache at all. The code comment explicitly says "Entries for the old model remain valid: if we switch back, they will be hit again rather than wasted." The test set_model_isolates_per_model_via_key asserts cached.len() == 1 after set_model, confirming entries are preserved. The key-isolation approach is arguably better than clearing, but the PR description is misleading and should be updated to match the actual behavior.

Suggested fix:

Update the PR description to say: "`set_model()` relies on per-model cache key isolation — entries for the old model are preserved and reused if the model is switched back. TTL and LRU eviction clean up stale entries naturally."

Comment thread src/llm/response_cache.rs
/// [`STATS_LOG_EVERY_N`]. Must be called while holding the cache lock so
/// that hit counts and entry count are consistent.
fn maybe_log_stats(guard: &HashMap<String, CacheEntry>, req_no: u64) {
if req_no.is_multiple_of(STATS_LOG_EVERY_N) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hit_rate_pct undercounts because evicted entries lose their hit_count

In maybe_log_stats, total_hits is computed as guard.values().map(|e| e.hit_count).sum() — the sum of hit counts across entries currently in the cache. When entries are evicted via TTL expiry (line 225: guard.retain(...)) or LRU eviction (line 228-235), their accumulated hit_count is permanently lost. Over time, the reported hit_rate_pct will drift downward from the true cache hit rate, especially under high eviction pressure. For example, if 500 requests hit a single entry that then expires, total_hits drops to 0 even though the cache served 500 hits.

Suggested fix:

Track total hits in a separate `AtomicU64` field (similar to `request_count`) that is incremented on every cache hit, independent of entry lifecycle. Use this for the hit rate calculation instead of summing per-entry counts.

Severity: low · Confidence: high

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added a total_hit_count: AtomicU64 field that is incremented on every cache hit and never decremented by TTL or LRU eviction. maybe_log_stats now takes this value as a parameter instead of summing per-entry hit_counts, so hit_rate_pct stays accurate even under high eviction pressure. Added total_hits_survives_eviction test to cover this scenario.

Comment thread src/llm/response_cache.rs
let effective_model = self.inner.effective_model_name(request.model.as_deref());
let key = cache_key(&effective_model, &request);
let now = Instant::now();
let req_no = self.request_count.fetch_add(1, Ordering::Relaxed) + 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats log silently skipped when provider error coincides with stats interval

The request_count is incremented unconditionally at line 184 (fetch_add(1, Relaxed) + 1), but maybe_log_stats is only reachable on the cache-hit path or the cache-miss-success path. If the inner provider returns an error (line 214: self.inner.complete(request).await?), the early ? return skips the stats log in the miss path. If this error happens on request number 100, 200, etc., no stats line is emitted and the next log won't appear until request 200, 300, etc. (assuming those succeed). For monitoring dashboards expecting regular stats intervals, this creates gaps.

Suggested fix:

Consider logging stats in a `defer`-style pattern, e.g. compute stats before the early return on error. Alternatively, document that stats logging is best-effort and may skip intervals on provider errors.

Severity: low · Confidence: medium

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed self.inner.complete(request).await? to store the Result first, acquire the cache lock, then match on the result. On Err, maybe_log_stats is called before returning so milestone intervals are never silently skipped. Added stats_logged_on_provider_error_at_interval test to verify.

Comment thread src/llm/response_cache.rs
};

/// How often (in requests) to emit a cache statistics log line.
const STATS_LOG_EVERY_N: u64 = 100;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test for periodic stats logging behavior

The STATS_LOG_EVERY_N constant and maybe_log_stats function are untested. There is no test verifying that: (1) stats are emitted every 100 requests, (2) stats are NOT emitted on non-milestone requests, (3) the logged values (total_requests, total_hits, hit_rate_pct, entry_count) are correct. Since this is a new feature called out in the PR summary, it should have at least a basic test.

Suggested fix:

Add a test using `tracing_subscriber::fmt::TestWriter` or the `tracing-test` crate to capture log output. Call `complete()` 100 times and assert that exactly one info-level log containing "LLM response cache statistics" is emitted with the expected field values.

Severity: low · Confidence: high

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added two tests using tracing-test: stats_logged_at_request_100 (asserts no log before request 100, then exactly one stats line at request 100) and stats_logged_on_provider_error_at_interval (verifies stats are still emitted when the provider errors at a milestone interval).

- Add total_hit_count AtomicU64 that is never decremented on eviction;
  maybe_log_stats now uses this counter so hit_rate_pct stays accurate
  under high eviction pressure
- Log cache stats before returning on provider error so milestone
  intervals (every 100 requests) are never silently skipped
- Add tracing-test dev-dep and three new tests: total_hits_survives_eviction,
  stats_logged_at_request_100, stats_logged_on_provider_error_at_interval
- Update PR description to reflect actual set_model() behavior (key
  isolation, not cache clear)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added scope: dependencies Dependency updates risk: medium Business logic, config, or moderate-risk modules and removed risk: low Changes to docs, tests, or low-risk modules labels Feb 25, 2026
@ilblackdragon ilblackdragon merged commit 7806273 into nearai:main Mar 6, 2026
4 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 6, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…ging, sync mutex (nearai#290)

* fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex

# Conflicts:
#	src/llm/response_cache.rs

* fix(llm): address response cache review comments

- Add total_hit_count AtomicU64 that is never decremented on eviction;
  maybe_log_stats now uses this counter so hit_rate_pct stays accurate
  under high eviction pressure
- Log cache stats before returning on provider error so milestone
  intervals (every 100 requests) are never silently skipped
- Add tracing-test dev-dep and three new tests: total_hits_survives_eviction,
  stats_logged_at_request_100, stats_logged_on_provider_error_at_interval
- Update PR description to reflect actual set_model() behavior (key
  isolation, not cache clear)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: regular 2-5 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/cli TUI / CLI channel scope: dependencies Dependency updates scope: llm LLM integration size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants