feat: enable Anthropic prompt caching via automatic cache_control injection#660
Conversation
- Inject cache_control via additional_params for Claude models in rig_adapter - Add cache_read_input_tokens and cache_creation_input_tokens to CompletionResponse and ToolCompletionResponse - Extract cached_input_tokens from rig-core unified Usage - Add is_anthropic_model() detection helper with provider prefix support - Log prompt cache hits at debug level (consistent with response_cache) - Add 7 unit tests for cache injection and model detection - Update all mock providers and test fixtures with new fields
…uard - Add cache_read_input_tokens to TokenUsage so cache counts flow from CompletionResponse through the reasoning layer to the dispatcher - Update CostGuard::record_llm_call() to accept cache_read_input_tokens: cached tokens are billed at 10% of the normal input rate - Thread cache_read_input_tokens from dispatcher into CostGuard - Add test_cache_discount_reduces_cost verifying exact savings match 90% of input cost for fully-cached requests - Update all existing test callers with zero-cache parameter
…e model support - Replace model-name-based is_anthropic_model() with explicit enable_prompt_cache flag on RigAdapter, set only for the direct Anthropic backend via with_prompt_cache(true) - Add supports_prompt_cache() to validate model names per Anthropic docs: only Claude 3+ models support caching; claude-2 and claude-instant are excluded to prevent 400 errors - Warn when caching is enabled but model does not support it - Replace is_anthropic_model tests with flag-based and model validation tests
…s through proxy - Move supports_prompt_cache() check into with_prompt_cache() so unsupported models are detected once at construction, not per request - Add cache_read_input_tokens and cache_creation_input_tokens to ProxyCompletionResponse and ProxyToolCompletionResponse with serde(default) for backward compatibility - Pass cache metrics through orchestrator proxy instead of zeroing - Use claude-opus-4-6 in cache discount test to match Anthropic semantics
- Add CacheRetention enum (none/short/long) to AnthropicDirectConfig - Parse ANTHROPIC_CACHE_RETENTION env var (default: short) - Inject TTL-aware cache_control (short=5m ephemeral, long=1h) - Extract cache_creation_input_tokens from raw Anthropic response - Add cache_write_multiplier() to LlmProvider trait (1.25x short, 2.0x long) - Pipe dynamic write multiplier through dispatcher to CostGuard - Add TokenUsage.cache_creation_input_tokens field - Add tests for Long TTL injection, 5m and 1h write surcharges - Document ANTHROPIC_CACHE_RETENTION in .env.example
- Add missing cost_per_token arg to cache test callsites - Apply cargo fmt to long lines in tests and tracing macros
- Use saturating_add for cache token sum to prevent u32 overflow - Tighten supports_prompt_cache to explicitly match claude-3+/claude-4+ and named families (claude-sonnet/claude-opus/claude-haiku)
…che fields - Resolve merge conflicts: adapt CacheRetention and cache injection to the declarative provider registry (RegistryProviderConfig replaces AnthropicDirectConfig) - Parse ANTHROPIC_CACHE_RETENTION env var in create_anthropic_from_registry() - Use Anthropic automatic caching via top-level cache_control in additional_params (rig-core #[serde(flatten)] places it at request root) - Add cache_read/creation_input_tokens fields to all mock LlmProviders added on main after PR #291 branched (response_cache, dispatcher, provider_chaos, trace_llm) - Suppress clippy::too_many_arguments on record_llm_call and build_rig_request - Add regression tests for cache injection (short/long/none) and cache_write_multiplier values Co-Authored-By: Canvinus <44225021+Canvinus@users.noreply.github.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Pull request overview
This PR enables Anthropic prompt caching for the direct Anthropic backend by injecting cache_control into requests via rig-core's additional_params, with configurable cache retention (none/short/long) and accurate cost tracking for cache write surcharges and read discounts. It continues work from PR #291 and adapts it to the new declarative provider registry from PR #618.
Changes:
- Adds
CacheRetentionenum withFromStr/Displayand acache_write_multipliermethod onLlmProvidertrait, configurable viaANTHROPIC_CACHE_RETENTIONenv var - Extends
CompletionResponse/ToolCompletionResponse/TokenUsagewithcache_read_input_tokensandcache_creation_input_tokensfields, with proper cost accounting inCostGuard - Updates all mock/test providers and proxy response types to include the new cache token fields
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/config/llm.rs |
New CacheRetention enum with None/Short/Long variants, FromStr, Display |
src/config/mod.rs |
Re-exports CacheRetention |
src/llm/provider.rs |
Adds cache_read_input_tokens/cache_creation_input_tokens to response types and cache_write_multiplier() to LlmProvider trait |
src/llm/rig_adapter.rs |
Core implementation: cache injection via additional_params, extract_cache_creation, supports_prompt_cache, cache debug logging, new tests |
src/llm/mod.rs |
Anthropic provider factory reads ANTHROPIC_CACHE_RETENTION and calls with_cache_retention() |
src/llm/reasoning.rs |
Propagates cache fields through TokenUsage |
src/agent/dispatcher.rs |
Passes cache fields and write multiplier to CostGuard::record_llm_call |
src/agent/cost_guard.rs |
Updated cost formula with cache read discount (10%) and write surcharge, new tests |
src/llm/nearai_chat.rs |
Zero-fills cache fields for non-Anthropic provider |
src/worker/api.rs |
Adds cache fields to proxy response types with #[serde(default)] |
src/orchestrator/api.rs |
Passes cache fields through proxy responses |
.env.example |
Documents ANTHROPIC_CACHE_RETENTION configuration |
src/llm/failover.rs, src/llm/smart_routing.rs, src/llm/response_cache.rs |
Zero-fills cache fields in test mock providers |
tests/support/trace_llm.rs, tests/provider_chaos.rs, tests/openai_compat_integration.rs, src/testing.rs |
Zero-fills cache fields in test providers |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Returns `1.0` by default (no surcharge). Anthropic providers return | ||
| /// `1.25` for 5-minute TTL or `2.0` for 1-hour TTL. | ||
| fn cache_write_multiplier(&self) -> Decimal { | ||
| Decimal::ONE | ||
| } | ||
| } | ||
|
|
||
| /// Sanitize a message list to ensure tool_use / tool_result integrity. |
There was a problem hiding this comment.
The cache_write_multiplier() method has a default implementation returning Decimal::ONE, but none of the wrapper providers (FailoverProvider, SmartRoutingProvider, CachedProvider, CircuitBreakerProvider, RetryProvider, RecordingLlm) delegate it to their inner provider. Since build_provider_chain() wraps the base Anthropic RigAdapter in up to 6 decorator layers, calling self.llm().cache_write_multiplier() in the dispatcher (line 279) will always return Decimal::ONE instead of the actual 1.25× or 2.0× multiplier from the RigAdapter.
Each wrapper needs to delegate cache_write_multiplier like they already delegate cost_per_token. For example, in the RetryProvider impl: fn cache_write_multiplier(&self) -> Decimal { self.inner.cache_write_multiplier() }. This applies to all 6 wrapper types: RetryProvider, SmartRoutingProvider, FailoverProvider, CircuitBreakerProvider, CachedProvider, and RecordingLlm.
| /// Returns `1.0` by default (no surcharge). Anthropic providers return | |
| /// `1.25` for 5-minute TTL or `2.0` for 1-hour TTL. | |
| fn cache_write_multiplier(&self) -> Decimal { | |
| Decimal::ONE | |
| } | |
| } | |
| /// Sanitize a message list to ensure tool_use / tool_result integrity. | |
| /// Implementors should return `1.0` when there is no surcharge. | |
| /// Anthropic providers return `1.25` for 5-minute TTL or `2.0` for 1-hour TTL. | |
| fn cache_write_multiplier(&self) -> Decimal; | |
| } | |
| /// Sanitize a message list to ensure tool_use / tool_result integrity. | |
| /// Sanitize a message list to ensure tool_use / tool_result integrity. |
There was a problem hiding this comment.
Fixed in 5797660. All 6 wrapper providers (RetryProvider, CircuitBreakerProvider, FailoverProvider, SmartRoutingProvider, CachedProvider, RecordingLlm) now delegate both cache_write_multiplier() and the new cache_read_discount() to their inner provider, matching how they already delegate cost_per_token().
| cache_write_multiplier_for(CacheRetention::None), | ||
| Decimal::ONE | ||
| ); | ||
| // Short → 1.25× (25% surcharge) | ||
| assert_eq!( | ||
| cache_write_multiplier_for(CacheRetention::Short), | ||
| Decimal::new(125, 2) | ||
| ); | ||
| // Long → 2.0× (100% surcharge) | ||
| assert_eq!( | ||
| cache_write_multiplier_for(CacheRetention::Long), | ||
| Decimal::TWO | ||
| ); | ||
| } | ||
|
|
||
| /// Helper to compute the multiplier without constructing a full RigAdapter. | ||
| fn cache_write_multiplier_for(retention: CacheRetention) -> rust_decimal::Decimal { | ||
| match retention { | ||
| CacheRetention::None => rust_decimal::Decimal::ONE, | ||
| CacheRetention::Short => rust_decimal::Decimal::new(125, 2), | ||
| CacheRetention::Long => rust_decimal::Decimal::TWO, | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The test_cache_write_multiplier_values test doesn't actually test the RigAdapter::cache_write_multiplier() method. It uses a standalone helper function cache_write_multiplier_for (lines 1076-1082) that duplicates the match logic. If the implementation in RigAdapter diverges from this helper, the test would still pass while the real code is wrong. Consider testing the actual trait method on a RigAdapter instance instead.
| cache_write_multiplier_for(CacheRetention::None), | |
| Decimal::ONE | |
| ); | |
| // Short → 1.25× (25% surcharge) | |
| assert_eq!( | |
| cache_write_multiplier_for(CacheRetention::Short), | |
| Decimal::new(125, 2) | |
| ); | |
| // Long → 2.0× (100% surcharge) | |
| assert_eq!( | |
| cache_write_multiplier_for(CacheRetention::Long), | |
| Decimal::TWO | |
| ); | |
| } | |
| /// Helper to compute the multiplier without constructing a full RigAdapter. | |
| fn cache_write_multiplier_for(retention: CacheRetention) -> rust_decimal::Decimal { | |
| match retention { | |
| CacheRetention::None => rust_decimal::Decimal::ONE, | |
| CacheRetention::Short => rust_decimal::Decimal::new(125, 2), | |
| CacheRetention::Long => rust_decimal::Decimal::TWO, | |
| } | |
| } | |
| RigAdapter::cache_write_multiplier(CacheRetention::None), | |
| Decimal::ONE | |
| ); | |
| // Short → 1.25× (25% surcharge) | |
| assert_eq!( | |
| RigAdapter::cache_write_multiplier(CacheRetention::Short), | |
| Decimal::new(125, 2) | |
| ); | |
| // Long → 2.0× (100% surcharge) | |
| assert_eq!( | |
| RigAdapter::cache_write_multiplier(CacheRetention::Long), | |
| Decimal::TWO | |
| ); | |
| } |
There was a problem hiding this comment.
Acknowledged in 5797660. Constructing a real RigAdapter requires a rig Model (which needs network/provider setup), so the test uses a standalone helper that mirrors the same match arms. Added a doc comment explaining this trade-off. The test_build_rig_request_* tests still exercise the full pipeline end-to-end as a safety net.
| // Uncached tokens = total input - cache reads - cache writes. | ||
| let cached_total = cache_read_input_tokens.saturating_add(cache_creation_input_tokens); | ||
| let uncached_input = input_tokens.saturating_sub(cached_total); | ||
| let cache_read_cost = input_rate * Decimal::from(cache_read_input_tokens) / dec!(10); |
There was a problem hiding this comment.
The cache read discount is hardcoded to 90% (dividing by 10) on line 179, which is Anthropic-specific. OpenAI also reports cached_input_tokens (via rig-core's Usage::cached_input_tokens field) but uses a 50% discount instead of 90%. Since the RigAdapter populates cache_read_input_tokens from response.usage.cached_input_tokens for all providers (line 539/619), this will miscalculate costs when an OpenAI-compatible provider reports cached tokens.
Consider making the cache read discount configurable per-provider (similar to cache_write_multiplier) rather than hardcoding Anthropic's 10% rate.
There was a problem hiding this comment.
Fixed in 5797660. Added a cache_read_discount() method to the LlmProvider trait (default: Decimal::ONE = no discount). RigAdapter overrides it to 10 for Anthropic (90% off). OpenAI providers can override to 2 (50% off) when cache support is added. CostGuard now uses the provider-supplied discount instead of hardcoding /10.
…ke cache_read_discount configurable The 6 decorator providers (Retry, CircuitBreaker, Failover, SmartRouting, CachedProvider, RecordingLlm) did not delegate cache_write_multiplier() to their inner provider, causing it to always return 1.0 instead of the actual 1.25x/2.0x from RigAdapter. This fix adds delegation for both cache_write_multiplier() and the new cache_read_discount() method. Also makes the cache read discount per-provider instead of hardcoding Anthropic's 90% discount (÷10). OpenAI uses 50% (÷2), so the discount is now returned by each provider via the LlmProvider trait. Addresses review feedback on PR #660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| impl std::str::FromStr for CacheRetention { | ||
| type Err = String; | ||
|
|
||
| fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
| match s.to_lowercase().as_str() { | ||
| "none" | "off" | "disabled" => Ok(Self::None), | ||
| "short" | "5m" | "ephemeral" => Ok(Self::Short), | ||
| "long" | "1h" => Ok(Self::Long), | ||
| _ => Err(format!( | ||
| "invalid cache retention '{}', expected one of: none, short, long", | ||
| s | ||
| )), | ||
| } | ||
| } |
There was a problem hiding this comment.
The CacheRetention enum implements FromStr and Display with several aliases (e.g., "off", "disabled", "5m", "ephemeral", "1h"), but there are no unit tests for the parsing logic. The analogous SslMode enum in src/config/database.rs:202-226 has tests for round-trip serialization, case-insensitivity, and invalid input. Consider adding similar tests for CacheRetention::from_str to verify all the accepted aliases and the error case.
There was a problem hiding this comment.
Fixed in cddd796. Added 5 unit tests for CacheRetention::from_str: primary values, all aliases (off/disabled/5m/ephemeral/1h), case-insensitivity, invalid input error message, and Display round-trip.
Tests cover primary values, aliases (off/disabled/5m/ephemeral/1h), case-insensitivity, invalid input error, and Display round-trip. Addresses Copilot review feedback on PR #660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ection (nearai#660) * feat(llm): add Anthropic prompt caching and cache token tracking - Inject cache_control via additional_params for Claude models in rig_adapter - Add cache_read_input_tokens and cache_creation_input_tokens to CompletionResponse and ToolCompletionResponse - Extract cached_input_tokens from rig-core unified Usage - Add is_anthropic_model() detection helper with provider prefix support - Log prompt cache hits at debug level (consistent with response_cache) - Add 7 unit tests for cache injection and model detection - Update all mock providers and test fixtures with new fields * feat(cost): apply 90% cache discount to prompt-cached tokens in CostGuard - Add cache_read_input_tokens to TokenUsage so cache counts flow from CompletionResponse through the reasoning layer to the dispatcher - Update CostGuard::record_llm_call() to accept cache_read_input_tokens: cached tokens are billed at 10% of the normal input rate - Thread cache_read_input_tokens from dispatcher into CostGuard - Add test_cache_discount_reduces_cost verifying exact savings match 90% of input cost for fully-cached requests - Update all existing test callers with zero-cache parameter * refactor(cache): scope cache_control to Anthropic backend and validate model support - Replace model-name-based is_anthropic_model() with explicit enable_prompt_cache flag on RigAdapter, set only for the direct Anthropic backend via with_prompt_cache(true) - Add supports_prompt_cache() to validate model names per Anthropic docs: only Claude 3+ models support caching; claude-2 and claude-instant are excluded to prevent 400 errors - Warn when caching is enabled but model does not support it - Replace is_anthropic_model tests with flag-based and model validation tests * fix(cache): validate model at construction and propagate cache metrics through proxy - Move supports_prompt_cache() check into with_prompt_cache() so unsupported models are detected once at construction, not per request - Add cache_read_input_tokens and cache_creation_input_tokens to ProxyCompletionResponse and ProxyToolCompletionResponse with serde(default) for backward compatibility - Pass cache metrics through orchestrator proxy instead of zeroing - Use claude-opus-4-6 in cache discount test to match Anthropic semantics * feat(llm): add configurable cache retention with write surcharge - Add CacheRetention enum (none/short/long) to AnthropicDirectConfig - Parse ANTHROPIC_CACHE_RETENTION env var (default: short) - Inject TTL-aware cache_control (short=5m ephemeral, long=1h) - Extract cache_creation_input_tokens from raw Anthropic response - Add cache_write_multiplier() to LlmProvider trait (1.25x short, 2.0x long) - Pipe dynamic write multiplier through dispatcher to CostGuard - Add TokenUsage.cache_creation_input_tokens field - Add tests for Long TTL injection, 5m and 1h write surcharges - Document ANTHROPIC_CACHE_RETENTION in .env.example * docs: fix stale cache_retention field comment * fix: resolve CI failures after upstream merge - Add missing cost_per_token arg to cache test callsites - Apply cargo fmt to long lines in tests and tracing macros * fix: address Copilot review feedback - Use saturating_add for cache token sum to prevent u32 overflow - Tighten supports_prompt_cache to explicitly match claude-3+/claude-4+ and named families (claude-sonnet/claude-opus/claude-haiku) * fix: adapt prompt caching to registry architecture and add missing cache fields - Resolve merge conflicts: adapt CacheRetention and cache injection to the declarative provider registry (RegistryProviderConfig replaces AnthropicDirectConfig) - Parse ANTHROPIC_CACHE_RETENTION env var in create_anthropic_from_registry() - Use Anthropic automatic caching via top-level cache_control in additional_params (rig-core #[serde(flatten)] places it at request root) - Add cache_read/creation_input_tokens fields to all mock LlmProviders added on main after PR nearai#291 branched (response_cache, dispatcher, provider_chaos, trace_llm) - Suppress clippy::too_many_arguments on record_llm_call and build_rig_request - Add regression tests for cache injection (short/long/none) and cache_write_multiplier values Co-Authored-By: Canvinus <44225021+Canvinus@users.noreply.github.com> * fix: delegate cache_write_multiplier through provider wrappers and make cache_read_discount configurable The 6 decorator providers (Retry, CircuitBreaker, Failover, SmartRouting, CachedProvider, RecordingLlm) did not delegate cache_write_multiplier() to their inner provider, causing it to always return 1.0 instead of the actual 1.25x/2.0x from RigAdapter. This fix adds delegation for both cache_write_multiplier() and the new cache_read_discount() method. Also makes the cache read discount per-provider instead of hardcoding Anthropic's 90% discount (÷10). OpenAI uses 50% (÷2), so the discount is now returned by each provider via the LlmProvider trait. Addresses review feedback on PR nearai#660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add CacheRetention FromStr/Display unit tests Tests cover primary values, aliases (off/disabled/5m/ephemeral/1h), case-insensitivity, invalid input error, and Display round-trip. Addresses Copilot review feedback on PR nearai#660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Andrey <canvi@2bb.dev> Co-authored-by: Andrey Gruzdev <44225021+Canvinus@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ection (nearai#660) * feat(llm): add Anthropic prompt caching and cache token tracking - Inject cache_control via additional_params for Claude models in rig_adapter - Add cache_read_input_tokens and cache_creation_input_tokens to CompletionResponse and ToolCompletionResponse - Extract cached_input_tokens from rig-core unified Usage - Add is_anthropic_model() detection helper with provider prefix support - Log prompt cache hits at debug level (consistent with response_cache) - Add 7 unit tests for cache injection and model detection - Update all mock providers and test fixtures with new fields * feat(cost): apply 90% cache discount to prompt-cached tokens in CostGuard - Add cache_read_input_tokens to TokenUsage so cache counts flow from CompletionResponse through the reasoning layer to the dispatcher - Update CostGuard::record_llm_call() to accept cache_read_input_tokens: cached tokens are billed at 10% of the normal input rate - Thread cache_read_input_tokens from dispatcher into CostGuard - Add test_cache_discount_reduces_cost verifying exact savings match 90% of input cost for fully-cached requests - Update all existing test callers with zero-cache parameter * refactor(cache): scope cache_control to Anthropic backend and validate model support - Replace model-name-based is_anthropic_model() with explicit enable_prompt_cache flag on RigAdapter, set only for the direct Anthropic backend via with_prompt_cache(true) - Add supports_prompt_cache() to validate model names per Anthropic docs: only Claude 3+ models support caching; claude-2 and claude-instant are excluded to prevent 400 errors - Warn when caching is enabled but model does not support it - Replace is_anthropic_model tests with flag-based and model validation tests * fix(cache): validate model at construction and propagate cache metrics through proxy - Move supports_prompt_cache() check into with_prompt_cache() so unsupported models are detected once at construction, not per request - Add cache_read_input_tokens and cache_creation_input_tokens to ProxyCompletionResponse and ProxyToolCompletionResponse with serde(default) for backward compatibility - Pass cache metrics through orchestrator proxy instead of zeroing - Use claude-opus-4-6 in cache discount test to match Anthropic semantics * feat(llm): add configurable cache retention with write surcharge - Add CacheRetention enum (none/short/long) to AnthropicDirectConfig - Parse ANTHROPIC_CACHE_RETENTION env var (default: short) - Inject TTL-aware cache_control (short=5m ephemeral, long=1h) - Extract cache_creation_input_tokens from raw Anthropic response - Add cache_write_multiplier() to LlmProvider trait (1.25x short, 2.0x long) - Pipe dynamic write multiplier through dispatcher to CostGuard - Add TokenUsage.cache_creation_input_tokens field - Add tests for Long TTL injection, 5m and 1h write surcharges - Document ANTHROPIC_CACHE_RETENTION in .env.example * docs: fix stale cache_retention field comment * fix: resolve CI failures after upstream merge - Add missing cost_per_token arg to cache test callsites - Apply cargo fmt to long lines in tests and tracing macros * fix: address Copilot review feedback - Use saturating_add for cache token sum to prevent u32 overflow - Tighten supports_prompt_cache to explicitly match claude-3+/claude-4+ and named families (claude-sonnet/claude-opus/claude-haiku) * fix: adapt prompt caching to registry architecture and add missing cache fields - Resolve merge conflicts: adapt CacheRetention and cache injection to the declarative provider registry (RegistryProviderConfig replaces AnthropicDirectConfig) - Parse ANTHROPIC_CACHE_RETENTION env var in create_anthropic_from_registry() - Use Anthropic automatic caching via top-level cache_control in additional_params (rig-core #[serde(flatten)] places it at request root) - Add cache_read/creation_input_tokens fields to all mock LlmProviders added on main after PR nearai#291 branched (response_cache, dispatcher, provider_chaos, trace_llm) - Suppress clippy::too_many_arguments on record_llm_call and build_rig_request - Add regression tests for cache injection (short/long/none) and cache_write_multiplier values Co-Authored-By: Canvinus <44225021+Canvinus@users.noreply.github.com> * fix: delegate cache_write_multiplier through provider wrappers and make cache_read_discount configurable The 6 decorator providers (Retry, CircuitBreaker, Failover, SmartRouting, CachedProvider, RecordingLlm) did not delegate cache_write_multiplier() to their inner provider, causing it to always return 1.0 instead of the actual 1.25x/2.0x from RigAdapter. This fix adds delegation for both cache_write_multiplier() and the new cache_read_discount() method. Also makes the cache read discount per-provider instead of hardcoding Anthropic's 90% discount (÷10). OpenAI uses 50% (÷2), so the discount is now returned by each provider via the LlmProvider trait. Addresses review feedback on PR nearai#660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add CacheRetention FromStr/Display unit tests Tests cover primary values, aliases (off/disabled/5m/ephemeral/1h), case-insensitivity, invalid input error, and Display round-trip. Addresses Copilot review feedback on PR nearai#660. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Andrey <canvi@2bb.dev> Co-authored-by: Andrey Gruzdev <44225021+Canvinus@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Continuation of #291 by @Canvinus.
Enable Anthropic prompt caching for the direct Anthropic backend with configurable cache retention and accurate write surcharge tracking. Uses Anthropic's automatic caching — a top-level
cache_controlfield that the API uses to auto-place cache breakpoints at the last cacheable block.Configuration
noneshortlongOnly the direct Anthropic backend (
LLM_BACKEND=anthropic) benefits. Other backends pass through zeroed cache fields.Changes from original
CacheRetentionconfig and cache injection to work withRegistryProviderConfig(replaces removedAnthropicDirectConfig,LlmBackendenum, and per-backend config types)ANTHROPIC_CACHE_RETENTIONenv var parsed increate_anthropic_from_registry()instead of the removedLlmConfig::resolve()Anthropic branchcache_read_input_tokens/cache_creation_input_tokensfields to mock providers added on main after PR feat: enable Anthropic prompt caching via cache_control injection #291 branched (response_cache.rs,dispatcher.rs,provider_chaos.rs,trace_llm.rs)clippy::too_many_argumentsonrecord_llm_callandbuild_rig_requestOriginal PR
#291 — feat: enable Anthropic prompt caching via cache_control injection
Review comments addressed
All 11 review comments from Copilot and Gemini were already resolved by @Canvinus in the original PR's follow-up commits (model validation, cost tracking, proxy passthrough, overflow protection, etc.)
Test plan
cargo clippy --all --all-featureszero warningscargo fmtcleancache_control: {"type": "ephemeral"}via additional_paramscache_control: {"type": "ephemeral", "ttl": "1h"}cache_controlentirelyCo-Authored-By: Canvinus 44225021+Canvinus@users.noreply.github.com
Generated with Claude Code