Skip to content

Fix torchcodec audio decoding to respect 'num_channels'#8028

Open
AsymptotaX wants to merge 3 commits intohuggingface:mainfrom
AsymptotaX:fix/issue-8005-audio-num-channels
Open

Fix torchcodec audio decoding to respect 'num_channels'#8028
AsymptotaX wants to merge 3 commits intohuggingface:mainfrom
AsymptotaX:fix/issue-8005-audio-num-channels

Conversation

@AsymptotaX
Copy link

Fixes torchcodec audio decoding when num_channels is set on Audio.

Before this change, AudioDecoder["array"] reduced multi-channel audio to mono by averaging channels, so the requested channel behavior was not respected.

With this PR:

  • multi-channel decoded arrays are preserved by default;
  • mono output is returned only when num_channels == 1 is explicitly requested.

Previous behavior

ds_stereo = ds.cast_column("audio", Audio(num_channels={...})) - None, 1, 2

Original file: '(16000, 2)' - stereo ✓
'Audio(num_channels=None)': '(16000,)' - mono ✗
'Audio(num_channels=2)': '(16000,)' - mono ✗
'Audio(num_channels=1)': '(16000,)' - mono ✓

New behavior

'num_channels=None' preserves the original number of channels from the source file.
'num_channels=2' preserves/converts to stereo output with shape '(2, num_samples)'.
'num_channels=1' downmixes to mono with shape '(num_samples,)'.

Results

Original file shape (via soundfile): (16000, 2)
HF datasets shape with num_channels=None: (2, 16000)
HF datasets shape with num_channels=1: (16000,)
HF datasets shape with num_channels=2: (2, 16000)

Fixes #8005.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ! I believe ["array"] is used a lot in transformers and has been expecting mono for a long time - from before we switched to torchcodec. So we might need to keep mono as default for ["array"] unless num_channels is specified explicitly. Would it be ok for you ?

I see in your tests that you expect stereo even if num_channels is not specified in Audio()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-channel audio is automatically cast to mono, num_channels is ignored

3 participants