Fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex#290
Conversation
Summary of ChangesHello @nlok5923, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request completes the LLM response cache implementation by addressing critical missing functionalities and improving its concurrency model. The primary goal was to ensure cache integrity and provide better operational visibility. By switching to a standard library mutex and implementing explicit cache invalidation on model changes, the system becomes more robust. Additionally, periodic logging of cache statistics offers valuable insights into its performance and effectiveness, enhancing the overall reliability and observability of the LLM provider. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively completes the implementation of the LLM response cache by adding cache invalidation on model changes, periodic statistics logging, and corresponding tests. The structural change from tokio::sync::Mutex to std::sync::Mutex is a sensible simplification given that the lock is not held across await points. The implementation is solid, but I have a couple of suggestions for improving maintainability and caching efficiency.
| if req_no.is_multiple_of(STATS_LOG_EVERY_N) { | ||
| let total_hits: u64 = guard.values().map(|e| e.hit_count).sum(); | ||
| let hit_rate = total_hits as f64 / req_no as f64 * 100.0; | ||
| tracing::info!( | ||
| total_requests = req_no, | ||
| total_hits, | ||
| hit_rate_pct = format!("{hit_rate:.1}"), | ||
| entry_count = guard.len(), | ||
| "LLM response cache statistics" | ||
| ); | ||
| } |
There was a problem hiding this comment.
This block of code for logging statistics is duplicated from the cache-hit path (lines 179-191). To improve maintainability and avoid future inconsistencies, consider extracting this logic into a private helper function on CachedProvider.
For example, you could add a method like this:
fn log_stats(&self, guard: &std::sync::MutexGuard<HashMap<String, CacheEntry>>, req_no: u64) {
let total_hits: u64 = guard.values().map(|e| e.hit_count).sum();
let hit_rate = total_hits as f64 / req_no as f64 * 100.0;
tracing::info!(
total_requests = req_no,
total_hits,
hit_rate_pct = format!("{hit_rate:.1}"),
entry_count = guard.len(),
"LLM response cache statistics"
);
}And then call it from both the cache-hit and cache-miss paths inside the if req_no.is_multiple_of(...) block.
There was a problem hiding this comment.
Added a private maybe_log_stats(guard, req_no) helper that both the cache-hit and cache-miss paths now call, eliminating the ~10-line duplicated block.
| fn set_model(&self, model: &str) -> Result<(), LlmError> { | ||
| self.inner.set_model(model) | ||
| let result = self.inner.set_model(model); | ||
| if result.is_ok() { | ||
| // All cached keys embed the previous model name, so they are now stale. | ||
| self.cache.lock().unwrap_or_else(|e| e.into_inner()).clear(); | ||
| tracing::debug!(model, "LLM response cache cleared after model switch"); | ||
| } | ||
| result | ||
| } |
There was a problem hiding this comment.
This implementation correctly invalidates the cache upon a model switch. This is necessary because the cache key is currently generated using model_name().
However, this approach clears the entire cache, losing all cached responses, which could be inefficient if you switch back and forth between models.
An alternative approach would be to use active_model_name() when generating the cache key in complete(). This would create distinct cache entries for each model, allowing the cache to hold responses for multiple models simultaneously.
This would involve two changes:
-
In
complete():let key = cache_key(&self.inner.active_model_name(), &request);
-
In
set_model(), you would no longer need to clear the cache:fn set_model(&self, model: &str) -> Result<(), LlmError> { self.inner.set_model(model) }
This would make the caching more robust to model switching. What are your thoughts on this alternative design?
There was a problem hiding this comment.
Removed the cache.clear() from set_model(). The cache key already embeds active_model_name() via
effective_model_name(), so switching models automatically routes requests to a different key. Old entries for the previous model survive — if you switch back, they'll be hit again rather than wasted. TTL/LRU handles cleanup naturally.
b8a6fce to
9cff711
Compare
9cff711 to
a19f5d6
Compare
…ging, sync mutex # Conflicts: # src/llm/response_cache.rs
a19f5d6 to
a6e9277
Compare
serrrfirat
left a comment
There was a problem hiding this comment.
Summary
A well-structured PR that adds stats logging and switches to std::sync::Mutex for the LLM response cache. The code is correct and the mutex invariant (never held across .await) is properly maintained. The main issue is a significant discrepancy between the PR description (which claims set_model clears the cache) and the implementation (which preserves cache entries and relies on per-model key isolation). The implementation approach is actually better than the described one, but the description should be corrected. The stats logging has a minor accuracy issue — evicted entries' hit counts are lost, causing the reported hit rate to undercount over time. A separate atomic counter for total hits would fix this. Test coverage for the new set_model behavior is good, but the stats logging feature has no tests at all.
General findings
[MEDIUM] PR description contradicts implementation: set_model does NOT clear the cache (src/llm/response_cache.rs:271)
The PR body states: "set_model() now clears the cache on success. If the inner provider rejects the switch, the cache is left untouched." However, the implementation at line 271-276 simply delegates to self.inner.set_model(model) without touching the cache at all. The code comment explicitly says "Entries for the old model remain valid: if we switch back, they will be hit again rather than wasted." The test set_model_isolates_per_model_via_key asserts cached.len() == 1 after set_model, confirming entries are preserved. The key-isolation approach is arguably better than clearing, but the PR description is misleading and should be updated to match the actual behavior.
Suggested fix:
Update the PR description to say: "`set_model()` relies on per-model cache key isolation — entries for the old model are preserved and reused if the model is switched back. TTL and LRU eviction clean up stale entries naturally."| /// [`STATS_LOG_EVERY_N`]. Must be called while holding the cache lock so | ||
| /// that hit counts and entry count are consistent. | ||
| fn maybe_log_stats(guard: &HashMap<String, CacheEntry>, req_no: u64) { | ||
| if req_no.is_multiple_of(STATS_LOG_EVERY_N) { |
There was a problem hiding this comment.
hit_rate_pct undercounts because evicted entries lose their hit_count
In maybe_log_stats, total_hits is computed as guard.values().map(|e| e.hit_count).sum() — the sum of hit counts across entries currently in the cache. When entries are evicted via TTL expiry (line 225: guard.retain(...)) or LRU eviction (line 228-235), their accumulated hit_count is permanently lost. Over time, the reported hit_rate_pct will drift downward from the true cache hit rate, especially under high eviction pressure. For example, if 500 requests hit a single entry that then expires, total_hits drops to 0 even though the cache served 500 hits.
Suggested fix:
Track total hits in a separate `AtomicU64` field (similar to `request_count`) that is incremented on every cache hit, independent of entry lifecycle. Use this for the hit rate calculation instead of summing per-entry counts.Severity: low · Confidence: high
There was a problem hiding this comment.
Fixed. Added a total_hit_count: AtomicU64 field that is incremented on every cache hit and never decremented by TTL or LRU eviction. maybe_log_stats now takes this value as a parameter instead of summing per-entry hit_counts, so hit_rate_pct stays accurate even under high eviction pressure. Added total_hits_survives_eviction test to cover this scenario.
| let effective_model = self.inner.effective_model_name(request.model.as_deref()); | ||
| let key = cache_key(&effective_model, &request); | ||
| let now = Instant::now(); | ||
| let req_no = self.request_count.fetch_add(1, Ordering::Relaxed) + 1; |
There was a problem hiding this comment.
Stats log silently skipped when provider error coincides with stats interval
The request_count is incremented unconditionally at line 184 (fetch_add(1, Relaxed) + 1), but maybe_log_stats is only reachable on the cache-hit path or the cache-miss-success path. If the inner provider returns an error (line 214: self.inner.complete(request).await?), the early ? return skips the stats log in the miss path. If this error happens on request number 100, 200, etc., no stats line is emitted and the next log won't appear until request 200, 300, etc. (assuming those succeed). For monitoring dashboards expecting regular stats intervals, this creates gaps.
Suggested fix:
Consider logging stats in a `defer`-style pattern, e.g. compute stats before the early return on error. Alternatively, document that stats logging is best-effort and may skip intervals on provider errors.Severity: low · Confidence: medium
There was a problem hiding this comment.
Fixed. Changed self.inner.complete(request).await? to store the Result first, acquire the cache lock, then match on the result. On Err, maybe_log_stats is called before returning so milestone intervals are never silently skipped. Added stats_logged_on_provider_error_at_interval test to verify.
| }; | ||
|
|
||
| /// How often (in requests) to emit a cache statistics log line. | ||
| const STATS_LOG_EVERY_N: u64 = 100; |
There was a problem hiding this comment.
No test for periodic stats logging behavior
The STATS_LOG_EVERY_N constant and maybe_log_stats function are untested. There is no test verifying that: (1) stats are emitted every 100 requests, (2) stats are NOT emitted on non-milestone requests, (3) the logged values (total_requests, total_hits, hit_rate_pct, entry_count) are correct. Since this is a new feature called out in the PR summary, it should have at least a basic test.
Suggested fix:
Add a test using `tracing_subscriber::fmt::TestWriter` or the `tracing-test` crate to capture log output. Call `complete()` 100 times and assert that exactly one info-level log containing "LLM response cache statistics" is emitted with the expected field values.Severity: low · Confidence: high
There was a problem hiding this comment.
Fixed. Added two tests using tracing-test: stats_logged_at_request_100 (asserts no log before request 100, then exactly one stats line at request 100) and stats_logged_on_provider_error_at_interval (verifies stats are still emitted when the provider errors at a milestone interval).
- Add total_hit_count AtomicU64 that is never decremented on eviction; maybe_log_stats now uses this counter so hit_rate_pct stays accurate under high eviction pressure - Log cache stats before returning on provider error so milestone intervals (every 100 requests) are never silently skipped - Add tracing-test dev-dep and three new tests: total_hits_survives_eviction, stats_logged_at_request_100, stats_logged_on_provider_error_at_interval - Update PR description to reflect actual set_model() behavior (key isolation, not cache clear) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ging, sync mutex (nearai#290) * fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex # Conflicts: # src/llm/response_cache.rs * fix(llm): address response cache review comments - Add total_hit_count AtomicU64 that is never decremented on eviction; maybe_log_stats now uses this counter so hit_rate_pct stays accurate under high eviction pressure - Log cache stats before returning on provider error so milestone intervals (every 100 requests) are never silently skipped - Add tracing-test dev-dep and three new tests: total_hits_survives_eviction, stats_logged_at_request_100, stats_logged_on_provider_error_at_interval - Update PR description to reflect actual set_model() behavior (key isolation, not cache clear) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Completes the LLM response cache (
CachedProvider) for issue #170. Thedecorator and wiring were already in place; three acceptance criteria were missing:
set_model()relies on per-model cache key isolation. Cache keys embedactive_model_name()viaeffective_model_name(), so switching modelsautomatically routes requests to a separate cache slot. Entries for the old
model are preserved — if you switch back, they will be hit again rather than
wasted. Natural TTL and LRU eviction clean up stale entries. No explicit
cache clear is performed on model switch.
No periodic stats logging.
complete()now emits atracing::info!line every 100 requests with
total_requests,total_hits,hit_rate_pct, andentry_count. Stats are emitted even when the innerprovider returns an error so that milestone intervals are never silently
skipped.
Missing
set_model()tests. Two new tests cover the success path(per-model key isolation, cache preserved and reused on switch-back) and
the failure path (cache preserved on error).
Structural change
Switched the cache lock from
tokio::sync::Mutextostd::sync::Mutex.The lock is never held across an
.await— both scopes incomplete()contain only CPU-bound HashMap operations with the real
.awaitin between.This lets
set_model()(a synchronousfnin theLlmProvidertrait)clear the cache directly, and makes
len(),is_empty(),total_hits(),and
clear()synchronous.Added a
total_hit_count: AtomicU64field that is incremented on every cachehit and never decremented on eviction.
maybe_log_statsuses this counterinstead of summing per-entry
hit_countvalues, sohit_rate_pctstaysaccurate under high eviction pressure.
Test plan
cargo test response_cache— 20/20 pass, 0 warningscargo clippy --all --benches --tests --examples --all-features— cleanRESPONSE_CACHE_ENABLED=truewiresCachedProviderinbuild_provider_chain()(src/llm/mod.rs:415)parse_optional_env("RESPONSE_CACHE_ENABLED", false)(src/config/llm.rs:237)