-
Notifications
You must be signed in to change notification settings - Fork 974
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
transformers.js version 3.6.1
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
Running whisper-large-v3-turbo_timestamped
produces broken timestamps.
Reproduction
Run the following code (takes ~30 seconds) and follow console log:
<script type="module">
const { env, pipeline } = await import(`https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/dist/transformers.min.js`);
env.allowLocalModels = false;
const buffer = await (await fetch("tos.pcm")).arrayBuffer();
const audio = new Float32Array(buffer);
const pipe = await pipeline("automatic-speech-recognition",
"onnx-community/whisper-large-v3-turbo_timestamped",
{dtype:"fp16", device:"webgpu"});
const result = await pipe(audio, {
chunk_length_s: 30,
stride_length_s: 5,
return_timestamps: "word",
language: "en"});
console.log(result.chunks.map(chunk => `${chunk.timestamp[0]} -> ${chunk.timestamp[1]} ${chunk.text}`))
</script>
Input is 60 second .pcm file, the console prints:
...
66: "40.82 -> 40.98 by"
67: "40.98 -> 41.1 these"
68: "41.1 -> 41.44 giant"
69: "41.44 -> 42.02 robotic"
70: "42.02 -> 69.98 claws--"
71: "69.98 -> 69.98 Oh,"
72: "69.98 -> 69.98 whatever,"
73: "69.98 -> 69.98 we're"
74: "69.98 -> 69.98 done!"
75: "69.98 -> 69.98 We're"
76: "69.98 -> 69.98 done!"
77: "69.98 -> 69.98 Robot's"
78: "69.98 -> 69.98 memory"
79: "69.98 -> 69.98 synced"
80: "69.98 -> 69.98 and"
81: "69.98 -> 69.98 locked."
There are two problems here:
- timestamps are not precise
- timestamps goes beyond the source duration
Attached the tos.pcm as .zip
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working