Skip to content

Fix/prompt caching support #2051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gulbaki
Copy link
Contributor

@gulbaki gulbaki commented Jul 2, 2025

Related Issues

Proposed Changes:

Anthropic beta cache usage is no longer behind a beta flag and can now be used without specifying a beta header.
Additionally, users can now include multiple beta features using a comma-separated format, e.g., beta1,beta2, as described in [Anthropic’s documentation](https://docs.anthropic.com/en/api/beta-headers#multiple-beta-features).

However, if this approach is not suitable for us, we can revert to the previous behavior where only a single beta feature was supported.

Here’s how I approached it: with each newly introduced beta feature, we can add a new conditional block to make it easier for users to adopt them.
But as mentioned, if this is not desirable, I can also simplify the logic to enforce single-beta usage again.

Additionally, caching is now set to a default of 5 minutes, but it has been updated to allow usage of a 1-hour TTL as a beta feature.

How did you test it?

I added tests that cover the work I’ve done, but some of the existing test cases have either become invalid or are now breaking the updated behavior. That’s why, instead of directly removing or editing them, I wanted to consult with you first and proceed accordingly.

  • test_prompt_caching_enabled: The content of this test can remain unchanged, but its name could potentially be updated to something more global like test_beta_enabled.
  • test_prompt_caching_cache_control_without_extra_headers: I had to make changes here because caching can now function even without the beta header. As a result, the original purpose of the test has been compromised.
  • test_run_with_prompt_caching: The function name could also be revised. The content remains valid since we are essentially testing whether the beta feature is applied or not.
  • test_to_dict_with_prompt_caching: Similarly, the name could be updated.
  • test_from_dict_with_prompt_caching: Same here — renaming might make it more accurate.

Notes for the reviewer

I made sure not to delete or modify any existing work without consulting you first. That’s why I’m open to updating the test code according to your preferences.

Additionally, regarding the beta headers accepting multiple values like beta,beta2, I can revert and adjust this behavior as well based on your feedback.

I haven't made any changes to the example usages for now; I will update them based on the decision we make.

Checklist

x I have read the contributors guidelines and the code of conduct
x I have updated the related issue with new insights and changes

  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

@gulbaki gulbaki requested a review from a team as a code owner July 2, 2025 13:25
@gulbaki gulbaki requested review from anakin87 and removed request for a team July 2, 2025 13:25
@CLAassistant
Copy link

CLAassistant commented Jul 2, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added integration:anthropic type:documentation Improvements or additions to documentation labels Jul 2, 2025
@anakin87 anakin87 requested a review from vblagoje July 2, 2025 14:07
@anakin87
Copy link
Member

anakin87 commented Jul 2, 2025

@gulbaki thanks for the PR.
Please fix type errors. You can reproduce them locally by running hatch run test:types.

@vblagoje vblagoje removed the request for review from anakin87 July 3, 2025 09:01
@anakin87
Copy link
Member

anakin87 commented Jul 3, 2025

I spoke with @vblagoje, who will take care of reviewing this PR.

@vblagoje
Copy link
Member

vblagoje commented Jul 3, 2025

@gulbaki Since these prompt cache headers move in and out of beta, I suggest we shift the responsibility to the user. My reasoning:

  • This is an advanced feature used by power users — they know how to set extra_headers
  • It keeps this Haystack integraton leaner and avoids future maintenance overhead tied to Anthropic's internal changes

That means basically forget these header, checking if they are properly set for 1h or not. Let the super user worry about it but don't step on his way.

Does that make sense to you? I'd focus on proper support of messages as described in https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

From what I see in the code we seem to support caching of system messages only, is that true? Seem's like Anthropic now allows caching messages up to certain regular message, including system message. Please investigate.

I'd worry about detecting cache_control in ChatMessage meta and transferring it to proper Anthropic format. Look at https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached - the implication is that we can now cache up to a certain regular message? wdyt?

@gulbaki
Copy link
Contributor Author

gulbaki commented Jul 4, 2025

@gulbaki Since these prompt cache headers move in and out of beta, I suggest we shift the responsibility to the user. My reasoning:

  • This is an advanced feature used by power users — they know how to set extra_headers
  • It keeps this Haystack integraton leaner and avoids future maintenance overhead tied to Anthropic's internal changes

That means basically forget these header, checking if they are properly set for 1h or not. Let the super user worry about it but don't step on his way.

Does that make sense to you? I'd focus on proper support of messages as described in https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

From what I see in the code we seem to support caching of system messages only, is that true? Seem's like Anthropic now allows caching messages up to certain regular message, including system message. Please investigate.

I'd worry about detecting cache_control in ChatMessage meta and transferring it to proper Anthropic format. Look at https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached - the implication is that we can now cache up to a certain regular message? wdyt?

Yes, I agree with what you said, I'm adjusting it so that it adds to all message types, not just system.

EDIT:
I made the additions as you suggested and added 2 tests to cover the changes.
Tests were written to ensure cache_control is forwarded for all 4 block types.(system_msg, user_msg, assistan_msg, tool_res)
As you mentioned, I've left the handling of extra features to the user.
I hope I understood what you said correctly 🙂

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

Thanks for the updates @gulbaki - I'll try your branch today and verify that it all works as intended on real examples. We also have examples dir in this integration with prompt_caching.py that we can us to verify prompt caching still works as expected. Let me update that file while I experiment with your new prompt caching implementation.

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

@gulbaki I can get this prompt caching to work. I keep getting:

Cache usage details: {'cache_creation_input_tokens': None, 'cache_read_input_tokens': None, 'server_tool_use': None, 'prompt_tokens': 25, 'completion_tokens': 346}

No matter what I do. I inspected under debugger what we send to Anthropic and it all looks good and according to https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Were you more lucky?

@gulbaki
Copy link
Contributor Author

gulbaki commented Jul 4, 2025

@vblagoje
I hadn’t realized that we’re skipping the real API key tests for Anthropic. Sorry about that.
I’ll probably get the same output as well.
I haven't tried it myself because I don’t have an API key.
Do you know if Anthropic offers any kind of free access or test credits?
If so, I’d be happy to try it out and investigate further.

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

Deal, reach out to me via email and I'll send you a temp key to resolve this one out https://github.com/vblagoje

@gulbaki
Copy link
Contributor Author

gulbaki commented Jul 4, 2025

@vblagoje
I found an API key with some remaining credits and gave it a try — none of the tests are failing on my end.
Which test was giving you the error exactly?
You can take a look at the issue in the screenshot below.
image

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

Ah, I was using updated https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/anthropic/example/prompt_caching.py example and the token_stats never shows cache usage or cache hit. Please have a look @gulbaki

@gulbaki
Copy link
Contributor Author

gulbaki commented Jul 4, 2025

@vblagoje
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-limitations
The minimum cacheable prompt length is: 1024 tokens for Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5 and Claude Opus 3
2048 tokens for Claude Haiku 3.5 and Claude Haiku 3

If the input doesn't exceed this value(1024), caching won't be triggered, there's a minimum token threshold that must be met.
Since our messages didn't reach that limit, that's why we're seeing this issue.

I’ve added a small example below, and The prompt_caching.py example is working against this rule — I can fix it if you’d like.

system_message = ChatMessage.from_system("Ab" * 4000)  # ~1000+ token
system_message._meta["cache_control"] = {"type": "ephemeral"}

messages = [
    system_message,
    ChatMessage.from_user("What is this?")
]


result1 = claude_llm.run(messages)
result2 = claude_llm.run(messages)

print(result1["replies"][0].meta["usage"])
print(result2["replies"][0].meta["usage"])

RESULT:


{'cache_creation_input_tokens': 4001, 'cache_read_input_tokens': 0, 'server_tool_use': None, 'service_tier': 'standard', 'prompt_tokens': 11, 'completion_tokens': 158}
{'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 4001, 'server_tool_use': None, 'service_tier': 'standard', 'prompt_tokens': 11, 'completion_tokens': 149}

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

Aha ok, ok. previously it worked and I checked that the message is "pretty long" but I never expected it to be this short

@mathislucka
Copy link
Member

I've got a question regarding this PR, maybe @gulbaki or @vblagoje can help:

The tests look like you need to set the cache control meta on the ChatMessages manually. The messages will be cached if meta["cache_control"] is set. Is this how it works?

The most important use case for prompt caching is agents. How would this work with an Agent where we can't set the cache control meta on the messages.

I think best would be to have a way to enable caching for all messages when we initialize the AnthropicChatGenerator. This way, we don't need to introduce Anthropic specific params in the Agent component.

Or am I missing something?

@vblagoje
Copy link
Member

vblagoje commented Jul 4, 2025

I've got a question regarding this PR, maybe @gulbaki or @vblagoje can help:

The tests look like you need to set the cache control meta on the ChatMessages manually. The messages will be cached if meta["cache_control"] is set. Is this how it works?

The most important use case for prompt caching is agents. How would this work with an Agent where we can't set the cache control meta on the messages.

I think best would be to have a way to enable caching for all messages when we initialize the AnthropicChatGenerator. This way, we don't need to introduce Anthropic specific params in the Agent component.

Or am I missing something?

Yes yes @mathislucka one needs to set cache_control on messages. Let's talk about it on Monday in the office - higher fidelity than Github. @gulbaki please continue with this task as we previously agreed and then after @mathislucka and I talk we'll loop you in more detail about our reasoning for eventual modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:anthropic type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update needed on Anthropic prompt caching support
5 participants