whisper-base_timestamped broken with chunk_length_s=30

### System Info

transformers.js: 3.6.1

### Environment/Platform

- [x] Website/web-app
- [ ] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)

### Description

Using chunk_length_s=30 and `onnx-community/whisper-base_timestamped` produces broken timestamsp

### Reproduction

Run the following code and notice the output in console log using the attached [src.pcm](https://github.com/user-attachments/files/21100555/src.zip) (in .zip)

```js
<script type="module">
const { env, pipeline } = await import("https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.6.1/dist/transformers.min.js");
env.allowLocalModels = false;

const buffer = await (await fetch("src.pcm")).arrayBuffer();
const audio = new Float32Array(buffer);

const pipe = await pipeline("automatic-speech-recognition",
	"onnx-community/whisper-base_timestamped",
	{dtype:{encoder_model:"fp32", decoder_model_merged:"q4"},
	device:"webgpu"});

const result = await pipe(audio, {
	chunk_length_s: 30,
	stride_length_s: 5,
	return_timestamps: "word",
	language: "en"});

console.log(result.chunks.map(chunk => `${chunk.timestamp[0]} -> ${chunk.timestamp[1]} ${chunk.text}`))
</script>
```

it prints: 

```
"29.98 -> 29.98  every",
"29.98 -> 29.98  day",
"29.98 -> 29.98  style."
```

Timestamps are invalid and there is also far more speaking.

Changing `chunk_length_s` to `29` fixes the issue and produces rather valid output:

```
"0 -> 0.42  everyday",
"0.42 -> 0.86  style.",
"1.38 -> 1.56  - True",
"1.56 -> 2  classic",
"2 -> 2.5  delivers",
"2.5 -> 3.02  premium",
"3.02 -> 3.54  essentials",
"3.54 -> 3.84  built",
"3.84 -> 4.08  for",
"4.08 -> 4.42  real",
"4.42 -> 4.9  life.",
"5.4 -> 5.64  Grab",
"5.64 -> 6.04  yours",
"6.04 -> 6.36  at",
"6.36 -> 6.86  Target,",
"7.3 -> 7.78  Costco,",
"8.28 -> 8.3  or",
"8.3 -> 8.5  head",
"8.5 -> 8.7  to",
"8.7 -> 9.34  TrueClassic",
"9.34 -> 10.04 .com",
"10.04 -> 12.28 /p4p.",
"12.86 -> 13.08  Get",
"13.08 -> 13.38  hooked",
"13.38 -> 13.52  up",
"13.52 -> 13.94  today.",
"14.16 -> 14.24  Now",
"14.24 -> 14.46  before",
"14.46 -> 14.62  we",
"14.62 -> 14.82  go,",
"15.1 -> 15.24  just",
"15.24 -> 15.42  wanna",
"15.42 -> 15.56  give",
"15.56 -> 15.68  a",
"15.68 -> 15.86  big",
"15.86 -> 16.1  shout",
"16.1 -> 16.28  out",
"16.28 -> 16.76  to",
"16.76 -> 16.9  the",
"16.9 -> 17.52  CEO",
"17.52 -> 17.86  and",
"17.86 -> 18.32  founder,",
"18.48 -> 18.6  Ryan",
"18.6 -> 18.92  Frouder,",
"18.98 -> 19.06  for",
"19.06 -> 19.22  coming",
"19.22 -> 19.36  on",
"19.36 -> 19.5  our",
"19.5 -> 19.76  show",
"19.76 -> 20.4  and",
"20.4 -> 20.6  just",
"20.6 -> 20.86  showing",
"20.86 -> 21.08  some",
"21.08 -> 21.28  love.",
"21.46 -> 21.62  Now,",
"21.9 -> 22.36  let's",
"22.36 -> 22.46  get",
"22.46 -> 22.7  back",
"22.7 -> 23.06  to",
"23.06 -> 23.22  the",
"23.22 -> 23.54  episode",
"24.32 -> 24.44  I",
"24.44 -> 24.6  mean",
"24.6 -> 25.4  like",
"25.4 -> 25.52  I",
"25.52 -> 25.68  said",
"25.68 -> 26.12  we're",
"26.12 -> 26.34  going",
"26.34 -> 26.58  through",
"26.58 -> 26.82  that",
"26.82 -> 27.34  we're",
"27.34 -> 27.56  losing",
"27.56 -> 28.02  stars",
"28.02 -> 29.26  and",
"29.26 -> 29.38  then",
"29.38 -> 29.56  we",
"29.56 -> 29.84  kind",
"29.84 -> 29.98  of"
```

Why is 30 broken in this case? Is 29 safer in all cases or is it just coincidence?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper-base_timestamped broken with chunk_length_s=30 #1358

System Info

Environment/Platform

Description

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

whisper-base_timestamped broken with chunk_length_s=30 #1358

Description

System Info

Environment/Platform

Description

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions