Skip to content

App is getting into endless loop #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MosheMaorKaltura opened this issue Feb 16, 2023 · 5 comments
Closed

App is getting into endless loop #508

MosheMaorKaltura opened this issue Feb 16, 2023 · 5 comments
Labels
decoding Decoding related issues enhancement New feature or request

Comments

@MosheMaorKaltura
Copy link

I am running whisper.cpp inside docker, as a POC I translated 800 WAV files.

In few cases (less then 5), the client is getting into endless loop at a certain time of the audio.
If I tell it to start second after the loop point, it transcript the audio as expected.
Few notes -

  1. The audio is english
  2. It happens across all models
  3. Using the python version - translate is fine.
  4. It is consistent always in the same point of time
    That is the output that I get (I change some of the text for privacy manaers):
    .....
    [00:00:30.000 --> 00:00:32.000] Some valid text bla bla
    [00:00:32.000 --> 00:00:35.000] Some valid text bla bla
    [00:00:35.000 --> 00:00:53.000] Some valid text bla bla
    [00:00:53.000 --> 00:01:09.000] Some valid text bla bla
    [00:01:09.000 --> 00:01:30.000] We were working at high school.
    [00:01:30.000 --> 00:01:45.000] We were working at high school.
    [00:01:45.000 --> 00:02:06.000] We were working at high school.
    [00:02:06.000 --> 00:02:21.000] We were working at high school.
    [00:02:21.000 --> 00:02:40.000] We were working at high school.
    ...
    Goes like that up to the end.
@LaurenzV
Copy link

LaurenzV commented Feb 16, 2023

Yeah, I've had the same happening to me with Chinese audio files... It only happens for some very few positions in the audio, but when it does happen it goes on until the end of the audio, and it also seems to happen consistently when I restart the transcription process...

@guardiaopt
Copy link

I also have the same problem when I have to transcribe in Portuguese.
Is there any solution for that?

@geimist
Copy link

geimist commented Feb 18, 2023

Have you checked this: #408 (reply in thread)

@guardiaopt
Copy link

With the parameter "--max-context 0", it no longer loops infinitely until the end of the file, but it still has some loops, but they are relatively small. Thank you!

@ggerganov ggerganov added enhancement New feature or request decoding Decoding related issues labels Feb 19, 2023
@ggerganov
Copy link
Member

This behavior occurs when the entropy-based repetition detection fails. It can be sometimes mitigated by adjusting the entropy threshold as explained here:

#471 (comment)

More robust strategy needs to be implemented.

Alternatively, you can try to use the beam-search decoder, but it will make the processing slower.

ggerganov added a commit that referenced this issue Apr 15, 2023
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close #471 #477 #508 #612 #719 #731
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
landtanin pushed a commit to landtanin/whisper.cpp that referenced this issue Dec 16, 2023
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
iThalay pushed a commit to iThalay/whisper.cpp that referenced this issue Sep 23, 2024
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close ggml-org#471 ggml-org#477 ggml-org#508 ggml-org#612 ggml-org#719 ggml-org#731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decoding Decoding related issues enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants