Skip to content

Fix int16 audio overflow in ASR Gradio demo#383

Open
hobostay wants to merge 1 commit intomicrosoft:mainfrom
hobostay:fix/int16-audio-overflow
Open

Fix int16 audio overflow in ASR Gradio demo#383
hobostay wants to merge 1 commit intomicrosoft:mainfrom
hobostay:fix/int16-audio-overflow

Conversation

@hobostay
Copy link
Copy Markdown

@hobostay hobostay commented May 4, 2026

Summary

  • Fix float32→int16 audio conversion that uses 32768.0 instead of 32767.0, causing integer overflow when audio amplitude >= 1.0
  • The overflow wraps values to negative numbers, producing audible distortion
  • Added np.clip to guard against out-of-range values
  • This aligns with the existing fix in gradio_asr_demo_api_video.py (line 586) which already uses 32767.0 with the comment "Use 32767.0 instead of 32768.0 to avoid potential overflow"

Details

Affected file: demo/vibevoice_asr_gradio_demo.py (lines 298 and 491)

The int16 range is [-32768, 32767]. Multiplying a float value of 1.0 by 32768.0 gives 32768.0, which overflows to -32768 when cast to np.int16.

Test plan

  • Verify audio playback quality in the Gradio demo has no distortion for loud audio segments
  • Verify the fix matches the approach already used in gradio_asr_demo_api_video.py

🤖 Generated with Claude Code

Use 32767.0 instead of 32768.0 when converting float32 audio to int16.
Values of 1.0 multiplied by 32768.0 overflow int16's max of 32767,
causing wrap-around to negative values and audible distortion.
Also add np.clip to guard against out-of-range values.

This aligns with the fix already applied in gradio_asr_demo_api_video.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant