-
Notifications
You must be signed in to change notification settings - Fork 19.6k
It worth not hiding exceptions #21432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What do you think about logging of exceptions, instead of hiding them? + There is an example of such hidden exception when head_shards and q_seq_shards were just not initialized because get_dist()==None.
And despite hardware supports flash attention - this code silently falls back to the standard with O(N^2) memory.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21432 +/- ##
==========================================
+ Coverage 74.94% 82.74% +7.79%
==========================================
Files 565 567 +2
Lines 55224 56466 +1242
Branches 8610 8825 +215
==========================================
+ Hits 41386 46720 +5334
+ Misses 11880 7581 -4299
- Partials 1958 2165 +207
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the robustness and observability of the dot_product_attention function by adding logging for fallback scenarios and refactoring the attention implementation selection. The changes improve the user's ability to diagnose performance and compatibility issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Please run pre-commit run --all-files
|
@WIgor , Thanks for the PR! Can you reformat the code? Then I can merge it. |
|
As far as I see it does nothing |
|
Ahh ok - it seems working now. |
What do you think about logging of exceptions, instead of hiding them? + There is an example of such hidden exception when head_shards and q_seq_shards were just not initialized because get_dist()==None.