-
Notifications
You must be signed in to change notification settings - Fork 172
Fix/prompt caching support #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…trol and multible beta header add and add test
@gulbaki thanks for the PR. |
I spoke with @vblagoje, who will take care of reviewing this PR. |
@gulbaki Since these prompt cache headers move in and out of beta, I suggest we shift the responsibility to the user. My reasoning:
That means basically forget these header, checking if they are properly set for 1h or not. Let the super user worry about it but don't step on his way. Does that make sense to you? I'd focus on proper support of messages as described in https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching From what I see in the code we seem to support caching of system messages only, is that true? Seem's like Anthropic now allows caching messages up to certain regular message, including system message. Please investigate. I'd worry about detecting cache_control in ChatMessage meta and transferring it to proper Anthropic format. Look at https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached - the implication is that we can now cache up to a certain regular message? wdyt? |
Yes, I agree with what you said, I'm adjusting it so that it adds to all message types, not just system. EDIT: |
sys_msg, usr_msg, asst_msg, tool_res and add test
Thanks for the updates @gulbaki - I'll try your branch today and verify that it all works as intended on real examples. We also have examples dir in this integration with prompt_caching.py that we can us to verify prompt caching still works as expected. Let me update that file while I experiment with your new prompt caching implementation. |
@gulbaki I can get this prompt caching to work. I keep getting:
No matter what I do. I inspected under debugger what we send to Anthropic and it all looks good and according to https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching Were you more lucky? |
@vblagoje |
Deal, reach out to me via email and I'll send you a temp key to resolve this one out https://github.com/vblagoje |
@vblagoje |
Ah, I was using updated https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/anthropic/example/prompt_caching.py example and the |
@vblagoje If the input doesn't exceed this value(1024), caching won't be triggered, there's a minimum token threshold that must be met. I’ve added a small example below, and The prompt_caching.py example is working against this rule — I can fix it if you’d like.
RESULT:
|
Aha ok, ok. previously it worked and I checked that the message is "pretty long" but I never expected it to be this short |
I've got a question regarding this PR, maybe @gulbaki or @vblagoje can help: The tests look like you need to set the cache control meta on the ChatMessages manually. The messages will be cached if The most important use case for prompt caching is agents. How would this work with an I think best would be to have a way to enable caching for all messages when we initialize the Or am I missing something? |
Yes yes @mathislucka one needs to set cache_control on messages. Let's talk about it on Monday in the office - higher fidelity than Github. @gulbaki please continue with this task as we previously agreed and then after @mathislucka and I talk we'll loop you in more detail about our reasoning for eventual modifications. |
Related Issues
Proposed Changes:
Anthropic beta cache usage is no longer behind a beta flag and can now be used without specifying a beta header.
Additionally, users can now include multiple beta features using a comma-separated format, e.g.,
beta1,beta2,
as described in [Anthropic’s documentation](https://docs.anthropic.com/en/api/beta-headers#multiple-beta-features).However, if this approach is not suitable for us, we can revert to the previous behavior where only a single beta feature was supported.
Here’s how I approached it: with each newly introduced beta feature, we can add a new conditional block to make it easier for users to adopt them.
But as mentioned, if this is not desirable, I can also simplify the logic to enforce single-beta usage again.
Additionally, caching is now set to a default of 5 minutes, but it has been updated to allow usage of a 1-hour TTL as a beta feature.
How did you test it?
I added tests that cover the work I’ve done, but some of the existing test cases have either become invalid or are now breaking the updated behavior. That’s why, instead of directly removing or editing them, I wanted to consult with you first and proceed accordingly.
test_prompt_caching_enabled
: The content of this test can remain unchanged, but its name could potentially be updated to something more global liketest_beta_enabled
.test_prompt_caching_cache_control_without_extra_headers
: I had to make changes here because caching can now function even without the beta header. As a result, the original purpose of the test has been compromised.test_run_with_prompt_caching
: The function name could also be revised. The content remains valid since we are essentially testing whether the beta feature is applied or not.test_to_dict_with_prompt_caching
: Similarly, the name could be updated.test_from_dict_with_prompt_caching
: Same here — renaming might make it more accurate.Notes for the reviewer
I made sure not to delete or modify any existing work without consulting you first. That’s why I’m open to updating the test code according to your preferences.
Additionally, regarding the beta headers accepting multiple values like beta,beta2, I can revert and adjust this behavior as well based on your feedback.
I haven't made any changes to the example usages for now; I will update them based on the decision we make.
Checklist
x I have read the contributors guidelines and the code of conduct
x I have updated the related issue with new insights and changes
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.