Fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex by nlok5923 · Pull Request #290 · nearai/ironclaw

nlok5923 · 2026-02-21T12:37:23Z

Summary

Completes the LLM response cache (CachedProvider) for issue #170. The
decorator and wiring were already in place; three acceptance criteria were missing:

set_model() relies on per-model cache key isolation. Cache keys embed
active_model_name() via effective_model_name(), so switching models
automatically routes requests to a separate cache slot. Entries for the old
model are preserved — if you switch back, they will be hit again rather than
wasted. Natural TTL and LRU eviction clean up stale entries. No explicit
cache clear is performed on model switch.
No periodic stats logging. complete() now emits a tracing::info!
line every 100 requests with total_requests, total_hits,
hit_rate_pct, and entry_count. Stats are emitted even when the inner
provider returns an error so that milestone intervals are never silently
skipped.
Missing set_model() tests. Two new tests cover the success path
(per-model key isolation, cache preserved and reused on switch-back) and
the failure path (cache preserved on error).

Structural change

Switched the cache lock from tokio::sync::Mutex to std::sync::Mutex.
The lock is never held across an .await — both scopes in complete()
contain only CPU-bound HashMap operations with the real .await in between.
This lets set_model() (a synchronous fn in the LlmProvider trait)
clear the cache directly, and makes len(), is_empty(), total_hits(),
and clear() synchronous.

Added a total_hit_count: AtomicU64 field that is incremented on every cache
hit and never decremented on eviction. maybe_log_stats uses this counter
instead of summing per-entry hit_count values, so hit_rate_pct stays
accurate under high eviction pressure.

Test plan

cargo test response_cache — 20/20 pass, 0 warnings
cargo clippy --all --benches --tests --examples --all-features — clean
RESPONSE_CACHE_ENABLED=true wires CachedProvider in build_provider_chain() (src/llm/mod.rs:415)
Cache disabled by default — parse_optional_env("RESPONSE_CACHE_ENABLED", false) (src/config/llm.rs:237)

gemini-code-assist · 2026-02-21T12:37:39Z

Summary of Changes

Hello @nlok5923, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request completes the LLM response cache implementation by addressing critical missing functionalities and improving its concurrency model. The primary goal was to ensure cache integrity and provide better operational visibility. By switching to a standard library mutex and implementing explicit cache invalidation on model changes, the system becomes more robust. Additionally, periodic logging of cache statistics offers valuable insights into its performance and effectiveness, enhancing the overall reliability and observability of the LLM provider.

Highlights

Concurrency Model Change: The cache lock was switched from tokio::sync::Mutex to std::sync::Mutex as the lock is never held across an .await point, allowing synchronous cache operations.
Cache Invalidation on Model Switch: The set_model() method now clears the response cache upon successful model change, as cached entries become stale. If the model switch fails, the cache remains untouched.
Periodic Cache Statistics Logging: The complete() method now logs cache statistics (total requests, hits, hit rate, entry count) every 100 requests using tracing::info!.
Synchronous Cache Operations: Methods like len(), is_empty(), total_hits(), and clear() on CachedProvider are now synchronous, reflecting the std::sync::Mutex usage.
New Test Coverage: Added new tests to specifically cover the cache clearing behavior of set_model() for both success and failure scenarios.

Changelog

src/llm/response_cache.rs
- Replaced tokio::sync::Mutex with std::sync::Mutex for cache locking.
- Introduced AtomicU64 for tracking total request count for statistics.
- Added STATS_LOG_EVERY_N constant to control the frequency of stats logging.
- Converted len(), is_empty(), total_hits(), and clear() methods to be synchronous.
- Implemented periodic logging of cache hit/miss statistics within the complete() method.
- Modified set_model() to clear the cache if the underlying model change is successful.
- Added a new SwitchableStub test helper to simulate model switching.
- Updated existing tests to reflect the synchronous nature of cache access methods.
- Introduced new tests to verify cache invalidation on set_model() success and non-invalidation on failure.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively completes the implementation of the LLM response cache by adding cache invalidation on model changes, periodic statistics logging, and corresponding tests. The structural change from tokio::sync::Mutex to std::sync::Mutex is a sensible simplification given that the lock is not held across await points. The implementation is solid, but I have a couple of suggestions for improving maintainability and caching efficiency.

gemini-code-assist · 2026-02-21T12:41:26Z

+            if req_no.is_multiple_of(STATS_LOG_EVERY_N) {
+                let total_hits: u64 = guard.values().map(|e| e.hit_count).sum();
+                let hit_rate = total_hits as f64 / req_no as f64 * 100.0;
+                tracing::info!(
+                    total_requests = req_no,
+                    total_hits,
+                    hit_rate_pct = format!("{hit_rate:.1}"),
+                    entry_count = guard.len(),
+                    "LLM response cache statistics"
+                );
+            }


This block of code for logging statistics is duplicated from the cache-hit path (lines 179-191). To improve maintainability and avoid future inconsistencies, consider extracting this logic into a private helper function on CachedProvider.

For example, you could add a method like this:

fn log_stats(&self, guard: &std::sync::MutexGuard<HashMap<String, CacheEntry>>, req_no: u64) { let total_hits: u64 = guard.values().map(|e| e.hit_count).sum(); let hit_rate = total_hits as f64 / req_no as f64 * 100.0; tracing::info!( total_requests = req_no, total_hits, hit_rate_pct = format!("{hit_rate:.1}"), entry_count = guard.len(), "LLM response cache statistics" ); }

And then call it from both the cache-hit and cache-miss paths inside the if req_no.is_multiple_of(...) block.

Added a private maybe_log_stats(guard, req_no) helper that both the cache-hit and cache-miss paths now call, eliminating the ~10-line duplicated block.

gemini-code-assist · 2026-02-21T12:41:26Z

    fn set_model(&self, model: &str) -> Result<(), LlmError> {
-        self.inner.set_model(model)
+        let result = self.inner.set_model(model);
+        if result.is_ok() {
+            // All cached keys embed the previous model name, so they are now stale.
+            self.cache.lock().unwrap_or_else(|e| e.into_inner()).clear();
+            tracing::debug!(model, "LLM response cache cleared after model switch");
+        }
+        result
    }


This implementation correctly invalidates the cache upon a model switch. This is necessary because the cache key is currently generated using model_name().

However, this approach clears the entire cache, losing all cached responses, which could be inefficient if you switch back and forth between models.

An alternative approach would be to use active_model_name() when generating the cache key in complete(). This would create distinct cache entries for each model, allowing the cache to hold responses for multiple models simultaneously.

This would involve two changes:

In complete():

let key = cache_key(&self.inner.active_model_name(), &request);

In set_model(), you would no longer need to clear the cache:

fn set_model(&self, model: &str) -> Result<(), LlmError> { self.inner.set_model(model) }

This would make the caching more robust to model switching. What are your thoughts on this alternative design?

Removed the cache.clear() from set_model(). The cache key already embeds active_model_name() via
effective_model_name(), so switching models automatically routes requests to a different key. Old entries for the previous model survive — if you switch back, they'll be hit again rather than wasted. TTL/LRU handles cleanup naturally.

…ging, sync mutex # Conflicts: # src/llm/response_cache.rs

serrrfirat

Summary

A well-structured PR that adds stats logging and switches to std::sync::Mutex for the LLM response cache. The code is correct and the mutex invariant (never held across .await) is properly maintained. The main issue is a significant discrepancy between the PR description (which claims set_model clears the cache) and the implementation (which preserves cache entries and relies on per-model key isolation). The implementation approach is actually better than the described one, but the description should be corrected. The stats logging has a minor accuracy issue — evicted entries' hit counts are lost, causing the reported hit rate to undercount over time. A separate atomic counter for total hits would fix this. Test coverage for the new set_model behavior is good, but the stats logging feature has no tests at all.

General findings

[MEDIUM] PR description contradicts implementation: set_model does NOT clear the cache (src/llm/response_cache.rs:271)

The PR body states: "set_model() now clears the cache on success. If the inner provider rejects the switch, the cache is left untouched." However, the implementation at line 271-276 simply delegates to self.inner.set_model(model) without touching the cache at all. The code comment explicitly says "Entries for the old model remain valid: if we switch back, they will be hit again rather than wasted." The test set_model_isolates_per_model_via_key asserts cached.len() == 1 after set_model, confirming entries are preserved. The key-isolation approach is arguably better than clearing, but the PR description is misleading and should be updated to match the actual behavior.

Suggested fix:

Update the PR description to say: "`set_model()` relies on per-model cache key isolation — entries for the old model are preserved and reused if the model is switched back. TTL and LRU eviction clean up stale entries naturally."

serrrfirat · 2026-02-25T15:43:47Z

+    /// [`STATS_LOG_EVERY_N`]. Must be called while holding the cache lock so
+    /// that hit counts and entry count are consistent.
+    fn maybe_log_stats(guard: &HashMap<String, CacheEntry>, req_no: u64) {
+        if req_no.is_multiple_of(STATS_LOG_EVERY_N) {


hit_rate_pct undercounts because evicted entries lose their hit_count

In maybe_log_stats, total_hits is computed as guard.values().map(|e| e.hit_count).sum() — the sum of hit counts across entries currently in the cache. When entries are evicted via TTL expiry (line 225: guard.retain(...)) or LRU eviction (line 228-235), their accumulated hit_count is permanently lost. Over time, the reported hit_rate_pct will drift downward from the true cache hit rate, especially under high eviction pressure. For example, if 500 requests hit a single entry that then expires, total_hits drops to 0 even though the cache served 500 hits.

Suggested fix:

Track total hits in a separate `AtomicU64` field (similar to `request_count`) that is incremented on every cache hit, independent of entry lifecycle. Use this for the hit rate calculation instead of summing per-entry counts.

Severity: low · Confidence: high

Fixed. Added a total_hit_count: AtomicU64 field that is incremented on every cache hit and never decremented by TTL or LRU eviction. maybe_log_stats now takes this value as a parameter instead of summing per-entry hit_counts, so hit_rate_pct stays accurate even under high eviction pressure. Added total_hits_survives_eviction test to cover this scenario.

serrrfirat · 2026-02-25T15:43:48Z

        let effective_model = self.inner.effective_model_name(request.model.as_deref());
        let key = cache_key(&effective_model, &request);
        let now = Instant::now();
+        let req_no = self.request_count.fetch_add(1, Ordering::Relaxed) + 1;


Stats log silently skipped when provider error coincides with stats interval

The request_count is incremented unconditionally at line 184 (fetch_add(1, Relaxed) + 1), but maybe_log_stats is only reachable on the cache-hit path or the cache-miss-success path. If the inner provider returns an error (line 214: self.inner.complete(request).await?), the early ? return skips the stats log in the miss path. If this error happens on request number 100, 200, etc., no stats line is emitted and the next log won't appear until request 200, 300, etc. (assuming those succeed). For monitoring dashboards expecting regular stats intervals, this creates gaps.

Suggested fix:

Consider logging stats in a `defer`-style pattern, e.g. compute stats before the early return on error. Alternatively, document that stats logging is best-effort and may skip intervals on provider errors.

Severity: low · Confidence: medium

Fixed. Changed self.inner.complete(request).await? to store the Result first, acquire the cache lock, then match on the result. On Err, maybe_log_stats is called before returning so milestone intervals are never silently skipped. Added stats_logged_on_provider_error_at_interval test to verify.

serrrfirat · 2026-02-25T15:43:48Z

 };

+/// How often (in requests) to emit a cache statistics log line.
+const STATS_LOG_EVERY_N: u64 = 100;


No test for periodic stats logging behavior

The STATS_LOG_EVERY_N constant and maybe_log_stats function are untested. There is no test verifying that: (1) stats are emitted every 100 requests, (2) stats are NOT emitted on non-milestone requests, (3) the logged values (total_requests, total_hits, hit_rate_pct, entry_count) are correct. Since this is a new feature called out in the PR summary, it should have at least a basic test.

Suggested fix:

Add a test using `tracing_subscriber::fmt::TestWriter` or the `tracing-test` crate to capture log output. Call `complete()` 100 times and assert that exactly one info-level log containing "LLM response cache statistics" is emitted with the expected field values.

Severity: low · Confidence: high

Fixed. Added two tests using tracing-test: stats_logged_at_request_100 (asserts no log before request 100, then exactly one stats line at request 100) and stats_logged_on_provider_error_at_interval (verifies stats are still emitted when the provider errors at a milestone interval).

- Add total_hit_count AtomicU64 that is never decremented on eviction; maybe_log_stats now uses this counter so hit_rate_pct stays accurate under high eviction pressure - Log cache stats before returning on provider error so milestone intervals (every 100 requests) are never silently skipped - Add tracing-test dev-dep and three new tests: total_hits_survives_eviction, stats_logged_at_request_100, stats_logged_on_provider_error_at_interval - Update PR description to reflect actual set_model() behavior (key isolation, not cache clear) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ging, sync mutex (nearai#290) * fix(llm): complete response cache — set_model invalidation, stats logging, sync mutex # Conflicts: # src/llm/response_cache.rs * fix(llm): address response cache review comments - Add total_hit_count AtomicU64 that is never decremented on eviction; maybe_log_stats now uses this counter so hit_rate_pct stays accurate under high eviction pressure - Log cache stats before returning on provider error so milestone intervals (every 100 requests) are never silently skipped - Add tracing-test dev-dep and three new tests: total_hits_survives_eviction, stats_logged_at_request_100, stats_logged_on_provider_error_at_interval - Update PR description to reflect actual set_model() behavior (key isolation, not cache clear) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions Bot added scope: llm LLM integration size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules labels Feb 21, 2026

github-actions Bot added the contributor: regular 2-5 merged PRs label Feb 21, 2026

gemini-code-assist Bot reviewed Feb 21, 2026

View reviewed changes

nlok5923 force-pushed the feat/llm-response-cache-170 branch from b8a6fce to 9cff711 Compare February 21, 2026 12:49

github-actions Bot added the scope: channel/cli TUI / CLI channel label Feb 21, 2026

nlok5923 force-pushed the feat/llm-response-cache-170 branch from 9cff711 to a19f5d6 Compare February 21, 2026 12:55

fix(llm): complete response cache — set_model invalidation, stats log…

a6e9277

…ging, sync mutex # Conflicts: # src/llm/response_cache.rs

nlok5923 force-pushed the feat/llm-response-cache-170 branch from a19f5d6 to a6e9277 Compare February 21, 2026 12:58

serrrfirat reviewed Feb 25, 2026

View reviewed changes

github-actions Bot added scope: dependencies Dependency updates risk: medium Business logic, config, or moderate-risk modules and removed risk: low Changes to docs, tests, or low-risk modules labels Feb 25, 2026

Merge branch 'main' into feat/llm-response-cache-170

363d0b0

ilblackdragon approved these changes Mar 6, 2026

View reviewed changes

ilblackdragon merged commit 7806273 into nearai:main Mar 6, 2026
4 checks passed

github-actions Bot mentioned this pull request Mar 6, 2026

chore: release v0.16.0 #595

Merged

BrewTestBot mentioned this pull request Mar 6, 2026

ironclaw 0.16.0 Homebrew/homebrew-core#270971

Merged

Conversation

nlok5923 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structural change

Test plan

Uh oh!

gemini-code-assist Bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Summary

General findings

[MEDIUM] PR description contradicts implementation: set_model does NOT clear the cache (src/llm/response_cache.rs:271)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nlok5923 commented Feb 21, 2026 •

edited

Loading