-
Notifications
You must be signed in to change notification settings - Fork 974
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
transformers.js: 3.6.1
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
Using chunk_length_s=30 and onnx-community/whisper-base_timestamped
produces broken timestamsp
Reproduction
Run the following code and notice the output in console log using the attached src.pcm (in .zip)
<script type="module">
const { env, pipeline } = await import("https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/dist/transformers.min.js");
env.allowLocalModels = false;
const buffer = await (await fetch("src.pcm")).arrayBuffer();
const audio = new Float32Array(buffer);
const pipe = await pipeline("automatic-speech-recognition",
"onnx-community/whisper-base_timestamped",
{dtype:{encoder_model:"fp32", decoder_model_merged:"q4"},
device:"webgpu"});
const result = await pipe(audio, {
chunk_length_s: 30,
stride_length_s: 5,
return_timestamps: "word",
language: "en"});
console.log(result.chunks.map(chunk => `${chunk.timestamp[0]} -> ${chunk.timestamp[1]} ${chunk.text}`))
</script>
it prints:
"29.98 -> 29.98 every",
"29.98 -> 29.98 day",
"29.98 -> 29.98 style."
Timestamps are invalid and there is also far more speaking.
Changing chunk_length_s
to 29
fixes the issue and produces rather valid output:
"0 -> 0.42 everyday",
"0.42 -> 0.86 style.",
"1.38 -> 1.56 - True",
"1.56 -> 2 classic",
"2 -> 2.5 delivers",
"2.5 -> 3.02 premium",
"3.02 -> 3.54 essentials",
"3.54 -> 3.84 built",
"3.84 -> 4.08 for",
"4.08 -> 4.42 real",
"4.42 -> 4.9 life.",
"5.4 -> 5.64 Grab",
"5.64 -> 6.04 yours",
"6.04 -> 6.36 at",
"6.36 -> 6.86 Target,",
"7.3 -> 7.78 Costco,",
"8.28 -> 8.3 or",
"8.3 -> 8.5 head",
"8.5 -> 8.7 to",
"8.7 -> 9.34 TrueClassic",
"9.34 -> 10.04 .com",
"10.04 -> 12.28 /p4p.",
"12.86 -> 13.08 Get",
"13.08 -> 13.38 hooked",
"13.38 -> 13.52 up",
"13.52 -> 13.94 today.",
"14.16 -> 14.24 Now",
"14.24 -> 14.46 before",
"14.46 -> 14.62 we",
"14.62 -> 14.82 go,",
"15.1 -> 15.24 just",
"15.24 -> 15.42 wanna",
"15.42 -> 15.56 give",
"15.56 -> 15.68 a",
"15.68 -> 15.86 big",
"15.86 -> 16.1 shout",
"16.1 -> 16.28 out",
"16.28 -> 16.76 to",
"16.76 -> 16.9 the",
"16.9 -> 17.52 CEO",
"17.52 -> 17.86 and",
"17.86 -> 18.32 founder,",
"18.48 -> 18.6 Ryan",
"18.6 -> 18.92 Frouder,",
"18.98 -> 19.06 for",
"19.06 -> 19.22 coming",
"19.22 -> 19.36 on",
"19.36 -> 19.5 our",
"19.5 -> 19.76 show",
"19.76 -> 20.4 and",
"20.4 -> 20.6 just",
"20.6 -> 20.86 showing",
"20.86 -> 21.08 some",
"21.08 -> 21.28 love.",
"21.46 -> 21.62 Now,",
"21.9 -> 22.36 let's",
"22.36 -> 22.46 get",
"22.46 -> 22.7 back",
"22.7 -> 23.06 to",
"23.06 -> 23.22 the",
"23.22 -> 23.54 episode",
"24.32 -> 24.44 I",
"24.44 -> 24.6 mean",
"24.6 -> 25.4 like",
"25.4 -> 25.52 I",
"25.52 -> 25.68 said",
"25.68 -> 26.12 we're",
"26.12 -> 26.34 going",
"26.34 -> 26.58 through",
"26.58 -> 26.82 that",
"26.82 -> 27.34 we're",
"27.34 -> 27.56 losing",
"27.56 -> 28.02 stars",
"28.02 -> 29.26 and",
"29.26 -> 29.38 then",
"29.38 -> 29.56 we",
"29.56 -> 29.84 kind",
"29.84 -> 29.98 of"
Why is 30 broken in this case? Is 29 safer in all cases or is it just coincidence?
hammeiam
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working