Fix/prompt caching support #2051

gulbaki · 2025-07-02T13:25:57Z

Related Issues

fixes Update needed on Anthropic prompt caching support #1942

Proposed Changes:

Anthropic beta cache usage is no longer behind a beta flag and can now be used without specifying a beta header.
Additionally, users can now include multiple beta features using a comma-separated format, e.g., beta1,beta2, as described in [Anthropic’s documentation](https://docs.anthropic.com/en/api/beta-headers#multiple-beta-features).

However, if this approach is not suitable for us, we can revert to the previous behavior where only a single beta feature was supported.

Here’s how I approached it: with each newly introduced beta feature, we can add a new conditional block to make it easier for users to adopt them.
But as mentioned, if this is not desirable, I can also simplify the logic to enforce single-beta usage again.

Additionally, caching is now set to a default of 5 minutes, but it has been updated to allow usage of a 1-hour TTL as a beta feature.

How did you test it?

I added tests that cover the work I’ve done, but some of the existing test cases have either become invalid or are now breaking the updated behavior. That’s why, instead of directly removing or editing them, I wanted to consult with you first and proceed accordingly.

test_prompt_caching_enabled: The content of this test can remain unchanged, but its name could potentially be updated to something more global like test_beta_enabled.
test_prompt_caching_cache_control_without_extra_headers: I had to make changes here because caching can now function even without the beta header. As a result, the original purpose of the test has been compromised.
test_run_with_prompt_caching: The function name could also be revised. The content remains valid since we are essentially testing whether the beta feature is applied or not.
test_to_dict_with_prompt_caching: Similarly, the name could be updated.
test_from_dict_with_prompt_caching: Same here — renaming might make it more accurate.

Notes for the reviewer

I made sure not to delete or modify any existing work without consulting you first. That’s why I’m open to updating the test code according to your preferences.

Additionally, regarding the beta headers accepting multiple values like beta,beta2, I can revert and adjust this behavior as well based on your feedback.

I haven't made any changes to the example usages for now; I will update them based on the decision we make.

Checklist

x I have read the contributors guidelines and the code of conduct
x I have updated the related issue with new insights and changes

I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

…trol and multible beta header add and add test

CLAassistant · 2025-07-02T13:26:06Z

All committers have signed the CLA.

anakin87 · 2025-07-02T14:07:44Z

@gulbaki thanks for the PR.
Please fix type errors. You can reproduce them locally by running hatch run test:types.

anakin87 · 2025-07-03T09:02:00Z

I spoke with @vblagoje, who will take care of reviewing this PR.

vblagoje · 2025-07-03T12:23:11Z

@gulbaki Since these prompt cache headers move in and out of beta, I suggest we shift the responsibility to the user. My reasoning:

This is an advanced feature used by power users — they know how to set extra_headers
It keeps this Haystack integraton leaner and avoids future maintenance overhead tied to Anthropic's internal changes

That means basically forget these header, checking if they are properly set for 1h or not. Let the super user worry about it but don't step on his way.

Does that make sense to you? I'd focus on proper support of messages as described in https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

From what I see in the code we seem to support caching of system messages only, is that true? Seem's like Anthropic now allows caching messages up to certain regular message, including system message. Please investigate.

I'd worry about detecting cache_control in ChatMessage meta and transferring it to proper Anthropic format. Look at https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached - the implication is that we can now cache up to a certain regular message? wdyt?

gulbaki · 2025-07-04T00:02:56Z

@gulbaki Since these prompt cache headers move in and out of beta, I suggest we shift the responsibility to the user. My reasoning:

This is an advanced feature used by power users — they know how to set extra_headers

It keeps this Haystack integraton leaner and avoids future maintenance overhead tied to Anthropic's internal changes

That means basically forget these header, checking if they are properly set for 1h or not. Let the super user worry about it but don't step on his way.

Does that make sense to you? I'd focus on proper support of messages as described in https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

From what I see in the code we seem to support caching of system messages only, is that true? Seem's like Anthropic now allows caching messages up to certain regular message, including system message. Please investigate.

I'd worry about detecting cache_control in ChatMessage meta and transferring it to proper Anthropic format. Look at https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached - the implication is that we can now cache up to a certain regular message? wdyt?

Yes, I agree with what you said, I'm adjusting it so that it adds to all message types, not just system.

EDIT:
I made the additions as you suggested and added 2 tests to cover the changes.
Tests were written to ensure cache_control is forwarded for all 4 block types.(system_msg, user_msg, assistan_msg, tool_res)
As you mentioned, I've left the handling of extra features to the user.
I hope I understood what you said correctly 🙂

sys_msg, usr_msg, asst_msg, tool_res and add test

vblagoje · 2025-07-04T08:48:56Z

Thanks for the updates @gulbaki - I'll try your branch today and verify that it all works as intended on real examples. We also have examples dir in this integration with prompt_caching.py that we can us to verify prompt caching still works as expected. Let me update that file while I experiment with your new prompt caching implementation.

vblagoje · 2025-07-04T12:55:17Z

@gulbaki I can get this prompt caching to work. I keep getting:

Cache usage details: {'cache_creation_input_tokens': None, 'cache_read_input_tokens': None, 'server_tool_use': None, 'prompt_tokens': 25, 'completion_tokens': 346}

No matter what I do. I inspected under debugger what we send to Anthropic and it all looks good and according to https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Were you more lucky?

gulbaki · 2025-07-04T13:06:39Z

@vblagoje
I hadn’t realized that we’re skipping the real API key tests for Anthropic. Sorry about that.
I’ll probably get the same output as well.
I haven't tried it myself because I don’t have an API key.
Do you know if Anthropic offers any kind of free access or test credits?
If so, I’d be happy to try it out and investigate further.

vblagoje · 2025-07-04T13:25:30Z

Deal, reach out to me via email and I'll send you a temp key to resolve this one out https://github.com/vblagoje

gulbaki · 2025-07-04T14:09:52Z

@vblagoje
I found an API key with some remaining credits and gave it a try — none of the tests are failing on my end.
Which test was giving you the error exactly?
You can take a look at the issue in the screenshot below.

vblagoje · 2025-07-04T14:14:12Z

Ah, I was using updated https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/anthropic/example/prompt_caching.py example and the token_stats never shows cache usage or cache hit. Please have a look @gulbaki

gulbaki · 2025-07-04T14:54:02Z

@vblagoje
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-limitations
The minimum cacheable prompt length is: 1024 tokens for Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5 and Claude Opus 3
2048 tokens for Claude Haiku 3.5 and Claude Haiku 3

If the input doesn't exceed this value(1024), caching won't be triggered, there's a minimum token threshold that must be met.
Since our messages didn't reach that limit, that's why we're seeing this issue.

I’ve added a small example below, and The prompt_caching.py example is working against this rule — I can fix it if you’d like.

system_message = ChatMessage.from_system("Ab" * 4000)  # ~1000+ token
system_message._meta["cache_control"] = {"type": "ephemeral"}

messages = [
    system_message,
    ChatMessage.from_user("What is this?")
]


result1 = claude_llm.run(messages)
result2 = claude_llm.run(messages)

print(result1["replies"][0].meta["usage"])
print(result2["replies"][0].meta["usage"])

RESULT:


{'cache_creation_input_tokens': 4001, 'cache_read_input_tokens': 0, 'server_tool_use': None, 'service_tier': 'standard', 'prompt_tokens': 11, 'completion_tokens': 158}
{'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 4001, 'server_tool_use': None, 'service_tier': 'standard', 'prompt_tokens': 11, 'completion_tokens': 149}

vblagoje · 2025-07-04T15:03:42Z

Aha ok, ok. previously it worked and I checked that the message is "pretty long" but I never expected it to be this short

mathislucka · 2025-07-04T15:15:23Z

I've got a question regarding this PR, maybe @gulbaki or @vblagoje can help:

The tests look like you need to set the cache control meta on the ChatMessages manually. The messages will be cached if meta["cache_control"] is set. Is this how it works?

The most important use case for prompt caching is agents. How would this work with an Agent where we can't set the cache control meta on the messages.

I think best would be to have a way to enable caching for all messages when we initialize the AnthropicChatGenerator. This way, we don't need to introduce Anthropic specific params in the Agent component.

Or am I missing something?

vblagoje · 2025-07-04T15:25:13Z

I've got a question regarding this PR, maybe @gulbaki or @vblagoje can help:

The tests look like you need to set the cache control meta on the ChatMessages manually. The messages will be cached if meta["cache_control"] is set. Is this how it works?

The most important use case for prompt caching is agents. How would this work with an Agent where we can't set the cache control meta on the messages.

I think best would be to have a way to enable caching for all messages when we initialize the AnthropicChatGenerator. This way, we don't need to introduce Anthropic specific params in the Agent component.

Or am I missing something?

Yes yes @mathislucka one needs to set cache_control on messages. Let's talk about it on Monday in the office - higher fidelity than Github. @gulbaki please continue with this task as we previously agreed and then after @mathislucka and I talk we'll loop you in more detail about our reasoning for eventual modifications.

gulbaki added 2 commits July 2, 2025 15:54

fix: Update prompt caching logic to handle extended TTL for cache_con…

728a45e

…trol and multible beta header add and add test

fix: lint fixes

8685b42

gulbaki requested a review from a team as a code owner July 2, 2025 13:25

gulbaki requested review from anakin87 and removed request for a team July 2, 2025 13:25

github-actions bot added integration:anthropic type:documentation Improvements or additions to documentation labels Jul 2, 2025

anakin87 requested a review from vblagoje July 2, 2025 14:07

fix: types fixed

b84d693

vblagoje removed the request for review from anakin87 July 3, 2025 09:01

gulbaki and others added 2 commits July 4, 2025 03:04

fix: forward cache_control for all types

1c0ef76

sys_msg, usr_msg, asst_msg, tool_res and add test

Merge branch 'main' into fix/prompt-caching-support

fba4e46

Merge branch 'main' into fix/prompt-caching-support

2455bc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/prompt caching support #2051

Fix/prompt caching support #2051

Uh oh!

gulbaki commented Jul 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Jul 2, 2025 •

edited

Loading

Uh oh!

anakin87 commented Jul 2, 2025

Uh oh!

anakin87 commented Jul 3, 2025

Uh oh!

vblagoje commented Jul 3, 2025 •

edited

Loading

Uh oh!

gulbaki commented Jul 4, 2025 •

edited

Loading

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

mathislucka commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

Uh oh!

Fix/prompt caching support #2051

Are you sure you want to change the base?

Fix/prompt caching support #2051

Uh oh!

Conversation

gulbaki commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

CLAassistant commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anakin87 commented Jul 2, 2025

Uh oh!

anakin87 commented Jul 3, 2025

Uh oh!

vblagoje commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gulbaki commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

gulbaki commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

mathislucka commented Jul 4, 2025

Uh oh!

vblagoje commented Jul 4, 2025

Uh oh!

Uh oh!

gulbaki commented Jul 2, 2025 •

edited

Loading

CLAassistant commented Jul 2, 2025 •

edited

Loading

vblagoje commented Jul 3, 2025 •

edited

Loading

gulbaki commented Jul 4, 2025 •

edited

Loading