Conversation
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
a5f014c to
6c1229e
Compare
85c02ef to
97bca9d
Compare
| # Jitter buffer: accumulate audio before starting playback to absorb network latency | ||
| # ResembleAI sends audio in bursts with 300-450ms gaps between them | ||
| # We need to buffer enough to cover these gaps before starting playback | ||
| self._jitter_buffer_bytes = 44100 # ~1000ms at 22050Hz to handle 400ms+ network gaps | ||
| self._playback_started: dict[str, bool] = {} # Track if we've started playback per request |
There was a problem hiding this comment.
I think that for all other TTS services we send the audio as soon as we get it.
And I thought we were already handling this buffering before starting playback somewhere else in the pipeline, but I might be mistaken.
So I’m just double checking if this is really needed for this service.
There was a problem hiding this comment.
I think this is kind of the same thing with _playback_started.
It feels like this service is slightly more complicated than the others we’ve developed, so I just want to confirm that this is really needed here.
There was a problem hiding this comment.
Yes, we do, but when I do that, the audio is choppy. I think this is an issue with the Resemble TTS service, which would ideally be fixed in the service itself. That is, you can't stream audio from Resemble, you have to stream to the client and then buffer. You can see the version before this without the buffering:
6c1229e
cc @krishvadhani19 who wrote this code.
There was a problem hiding this comment.
I see,yeah, the downside is that we would basically always add a 400 ms latency due to the buffer. But if this is a current service limitation, I think it makes sense.
There was a problem hiding this comment.
Agreed. This is a pretty big downside. The best TTS services has a P90+ latency of 200ms.
@krishvadhani19 can we avoid needing to buffer? Without it, I hear very choppy audio.
filipi87
left a comment
There was a problem hiding this comment.
LGTM!
Hopefully we’ll be able to remove this audio buffer in the future, but for now it probably makes sense to keep it to prevent the audio from being choppy.
Hey @markbackman, the buffer I have maintained is because resemble streams audio in bursts. We have sometimes gaps of more than 500ms. So this initial buffer helps solve it someway. I have communicated the above problem with the team. Once fixed, I will get rid of this buffer from our service. thank you. cc: @filipi87 |
|
@krishvadhani19 thanks for the reply. Given that, we'll merge as is and once your service no longer requires this, we can remove the buffering. |
thank you @markbackman. |
97bca9d to
a592b7f
Compare
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
cc @krishvadhani19 for review.