Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771

sambhavnoobcoder · 2025-01-19T08:59:19Z

Problem Statement

There were two issues with the audio classification pipeline's top_k parameter:

Documentation mismatch: The docs stated that when top_k=None, all labels should be returned, but this wasn't implemented
Runtime error: When top_k=None and a model had fewer labels than the default (5), the pipeline would crash

Fixes : #35736

Root Cause Analysis

After investigating the implementation, we found:

The __init__ method was unconditionally setting a default top_k=5, even when explicitly set to None
The _sanitize_parameters method wasn't properly handling the None case, leading to potential crashes when models had fewer labels than the requested/default top_k

Solution

The fix involves:

Properly handling top_k=None in initialization to respect the user's intent to get all labels
Updating _sanitize_parameters to use the model's total number of labels when top_k=None
Maintaining backward compatibility by keeping the default top_k=5 when not explicitly specified
Adding comprehensive tests to verify the behavior

Implementation Details

Changes were made to:

AudioClassificationPipeline.__init__: Modified to properly handle None value
AudioClassificationPipeline._sanitize_parameters: Updated to use all labels when top_k=None
Added new test file test_audio_classification_top_k.py with three test cases:
- Testing top_k=None returns all labels
- Testing behavior with models having fewer labels
- Testing top_k greater than available labels

Testing

Added comprehensive tests that verify:

When top_k=None, all labels are returned
Models with fewer labels than the default work correctly
Large top_k values are properly capped to available labels

Test Results

All test cases pass successfully, confirming the fix works as intended while maintaining backward compatibility.

Note About Warning:
During test execution, we see a deprecation warning: UserWarning:

Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead...

This warning is unrelated to our top_k changes. It comes from the model configuration initialization and is about the deprecation of setting gradient checkpointing via config. This is a broader change planned for Transformers v5 to move gradient checkpointing configuration from model initialization to explicit method calls. It doesn't affect our audio classification pipeline's functionality or the top_k parameter behavior.

the warning would go away by exploring a better model with multiple labels supporting gradient checkpointing even now , but i decided to leave the same model in place for originality purposes as it doesn't affect this PR in any way or form .

cc: @Rocketknight1

Rocketknight1 · 2025-01-20T16:10:51Z

Yes, this looks clean to me, and I really appreciate the broad test coverage!

cc @wilke0818, does this fix the issues you saw?

wilke0818 · 2025-01-21T16:30:12Z

Yep just tried on the original code I had, as well as did some stress testing of it and it seems to work in all scenarios I can think of. Thanks for the quick responses and fixes! Also did a quick check and it doesn't seem like this issue exists in the other classification pipelines (I don't see any top_k default value setting).

sambhavnoobcoder · 2025-01-30T18:06:15Z

Cool . Then I think we can merge this @Rocketknight1 ?

Rocketknight1

Yes, approving, and thank you for the PR!

ArthurZucker · 2025-02-06T10:35:21Z

tests/test_audio_classification_top_k.py

cc @Rocketknight1 this should not need a single file on it's own, this should go in the pipeline tests

Will fix, my bad for not noticing it was a separate file in the review!

…#35736 (#35771) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <[email protected]>

…huggingface#35736 (huggingface#35771) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <[email protected]>

sambhavnoobcoder · 2025-03-19T18:12:43Z

Hey @Rocketknight1 , sorry for inconvenience on a merged PR , but i was looking through my past contributions and i found that even though i was getting PRs merged like this one , or others like 35859 , 35858 , 35735 or 36345 , I could not see my name / image on the contributors board/list for this repo/ project . Could you look into it and tell me why is that happening ? Is there some other process i am missing out to be listed on the board ? i already posted about this on discord first , but was told to discuss this directly on the github issue itself , hence the query.

Rocketknight1 · 2025-03-19T18:27:22Z

Hi @sambhavnoobcoder, do you mean here? https://github.com/huggingface/transformers/graphs/contributors

If so, that's part of Github, not managed by us. It's sorted by total commits (PRs merged) to the repo. You'll turn up there eventually!

sambhavnoobcoder · 2025-03-19T18:39:33Z

yes @Rocketknight1 , that was what i was looking at . actually i added the number of commits from all of my PRs that were merged so far , and thought that i could've been there on the basis of that number , but if the metrics/requirements for the board are something different , then i am not sure .

Rocketknight1 · 2025-03-20T13:12:00Z

Hi @sambhavnoobcoder, actually, one single PR being merged counts as one "commit" to Transformers. Even though the PR branch has multiple commits, it's squished into a single update to the main branch that is recorded as a single commit.

That makes it very hard for the rest of us to catch up with founders like @julien-c and @thomwolf who were able to just make lots of raw commits directly to the repo in the early days 😅

sambhavnoobcoder · 2025-03-20T18:43:19Z

ohh , that makes perfect sense !! haha , i was wondering if i was doing something wrong , but this makes perfect sense . thanks for the quick reply .

sambhavnoobcoder added 6 commits January 19, 2025 14:08

added condition for top_k Doc mismatch fix

27d05d3

initilation of test file for top_k changes

5316973

added test for returning all labels

9fc18f6

added test for few labels

e887054

tests/test_audio_classification_top_k.py

94cb96a

final fix

82928b2

sambhavnoobcoder requested review from ArthurZucker and Rocketknight1 as code owners January 19, 2025 08:59

ruff fix

bc16a09

Rocketknight1 approved these changes Feb 5, 2025

View reviewed changes

Rocketknight1 merged commit 0de15c9 into huggingface:main Feb 5, 2025
25 checks passed

ArthurZucker reviewed Feb 6, 2025

View reviewed changes

Rocketknight1 mentioned this pull request Jun 4, 2025

Audio-Classification pipeline function_to_apply ignores initialized values (possibly generalizes to other classification pipelines) #35739

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771

Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771

Uh oh!

sambhavnoobcoder commented Jan 19, 2025

Uh oh!

Rocketknight1 commented Jan 20, 2025

Uh oh!

wilke0818 commented Jan 21, 2025

Uh oh!

sambhavnoobcoder commented Jan 30, 2025

Uh oh!

Rocketknight1 left a comment

Uh oh!

Uh oh!

ArthurZucker Feb 6, 2025

Uh oh!

Rocketknight1 Feb 6, 2025

Uh oh!

sambhavnoobcoder commented Mar 19, 2025

Uh oh!

Rocketknight1 commented Mar 19, 2025

Uh oh!

sambhavnoobcoder commented Mar 19, 2025

Uh oh!

Rocketknight1 commented Mar 20, 2025

Uh oh!

sambhavnoobcoder commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771

Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 #35771

Uh oh!

Conversation

sambhavnoobcoder commented Jan 19, 2025

Problem Statement

Root Cause Analysis

Solution

Implementation Details

Testing

Test Results

Uh oh!

Rocketknight1 commented Jan 20, 2025

Uh oh!

wilke0818 commented Jan 21, 2025

Uh oh!

sambhavnoobcoder commented Jan 30, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

sambhavnoobcoder commented Mar 19, 2025

Uh oh!

Rocketknight1 commented Mar 19, 2025

Uh oh!

sambhavnoobcoder commented Mar 19, 2025

Uh oh!

Rocketknight1 commented Mar 20, 2025

Uh oh!

sambhavnoobcoder commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants