-
Notifications
You must be signed in to change notification settings - Fork 4.2k
App is getting into endless loop #508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, I've had the same happening to me with Chinese audio files... It only happens for some very few positions in the audio, but when it does happen it goes on until the end of the audio, and it also seems to happen consistently when I restart the transcription process... |
I also have the same problem when I have to transcribe in Portuguese. |
Have you checked this: #408 (reply in thread) |
With the parameter "--max-context 0", it no longer loops infinitely until the end of the file, but it still has some loops, but they are relatively small. Thank you! |
This behavior occurs when the entropy-based repetition detection fails. It can be sometimes mitigated by adjusting the entropy threshold as explained here: More robust strategy needs to be implemented. Alternatively, you can try to use the beam-search decoder, but it will make the processing slower. |
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
I am running whisper.cpp inside docker, as a POC I translated 800 WAV files.
In few cases (less then 5), the client is getting into endless loop at a certain time of the audio.
If I tell it to start second after the loop point, it transcript the audio as expected.
Few notes -
That is the output that I get (I change some of the text for privacy manaers):
.....
[00:00:30.000 --> 00:00:32.000] Some valid text bla bla
[00:00:32.000 --> 00:00:35.000] Some valid text bla bla
[00:00:35.000 --> 00:00:53.000] Some valid text bla bla
[00:00:53.000 --> 00:01:09.000] Some valid text bla bla
[00:01:09.000 --> 00:01:30.000] We were working at high school.
[00:01:30.000 --> 00:01:45.000] We were working at high school.
[00:01:45.000 --> 00:02:06.000] We were working at high school.
[00:02:06.000 --> 00:02:21.000] We were working at high school.
[00:02:21.000 --> 00:02:40.000] We were working at high school.
...
Goes like that up to the end.
The text was updated successfully, but these errors were encountered: