-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771
Conversation
|
Yes, this looks clean to me, and I really appreciate the broad test coverage! cc @wilke0818, does this fix the issues you saw? |
|
Yep just tried on the original code I had, as well as did some stress testing of it and it seems to work in all scenarios I can think of. Thanks for the quick responses and fixes! Also did a quick check and it doesn't seem like this issue exists in the other classification pipelines (I don't see any top_k default value setting). |
|
Cool . Then I think we can merge this @Rocketknight1 ? |
Rocketknight1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, approving, and thank you for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Rocketknight1 this should not need a single file on it's own, this should go in the pipeline tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix, my bad for not noticing it was a separate file in the review!
…#35736 (#35771) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <[email protected]>
…huggingface#35736 (huggingface#35771) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <[email protected]>
…huggingface#35736 (huggingface#35771) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <[email protected]>
|
Hey @Rocketknight1 , sorry for inconvenience on a merged PR , but i was looking through my past contributions and i found that even though i was getting PRs merged like this one , or others like 35859 , 35858 , 35735 or 36345 , I could not see my name / image on the contributors board/list for this repo/ project . Could you look into it and tell me why is that happening ? Is there some other process i am missing out to be listed on the board ? i already posted about this on discord first , but was told to discuss this directly on the github issue itself , hence the query. |
|
Hi @sambhavnoobcoder, do you mean here? https://github.com/huggingface/transformers/graphs/contributors If so, that's part of Github, not managed by us. It's sorted by total commits (PRs merged) to the repo. You'll turn up there eventually! |
|
yes @Rocketknight1 , that was what i was looking at . actually i added the number of commits from all of my PRs that were merged so far , and thought that i could've been there on the basis of that number , but if the metrics/requirements for the board are something different , then i am not sure . |
|
Hi @sambhavnoobcoder, actually, one single PR being merged counts as one "commit" to Transformers. Even though the PR branch has multiple commits, it's squished into a single update to the main branch that is recorded as a single commit. That makes it very hard for the rest of us to catch up with founders like @julien-c and @thomwolf who were able to just make lots of raw commits directly to the repo in the early days 😅 |
|
ohh , that makes perfect sense !! haha , i was wondering if i was doing something wrong , but this makes perfect sense . thanks for the quick reply . |
Problem Statement
There were two issues with the audio classification pipeline's
top_kparameter:top_k=None, all labels should be returned, but this wasn't implementedtop_k=Noneand a model had fewer labels than the default (5), the pipeline would crashFixes : #35736
Root Cause Analysis
After investigating the implementation, we found:
__init__method was unconditionally setting a defaulttop_k=5, even when explicitly set toNone_sanitize_parametersmethod wasn't properly handling theNonecase, leading to potential crashes when models had fewer labels than the requested/defaulttop_kSolution
The fix involves:
top_k=Nonein initialization to respect the user's intent to get all labels_sanitize_parametersto use the model's total number of labels whentop_k=Nonetop_k=5when not explicitly specifiedImplementation Details
Changes were made to:
AudioClassificationPipeline.__init__: Modified to properly handleNonevalueAudioClassificationPipeline._sanitize_parameters: Updated to use all labels whentop_k=Nonetest_audio_classification_top_k.pywith three test cases:top_k=Nonereturns all labelstop_kgreater than available labelsTesting
Added comprehensive tests that verify:
top_k=None, all labels are returnedtop_kvalues are properly capped to available labelsTest Results
All test cases pass successfully, confirming the fix works as intended while maintaining backward compatibility.
Note About Warning:
During test execution, we see a deprecation warning: UserWarning:
Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead...This warning is unrelated to our
top_kchanges. It comes from the model configuration initialization and is about the deprecation of setting gradient checkpointing via config. This is a broader change planned for Transformers v5 to move gradient checkpointing configuration from model initialization to explicit method calls. It doesn't affect our audio classification pipeline's functionality or thetop_kparameter behavior.the warning would go away by exploring a better model with multiple labels supporting gradient checkpointing even now , but i decided to leave the same model in place for originality purposes as it doesn't affect this PR in any way or form .
cc: @Rocketknight1