pipecat version
0.0.101
Python version
No response
Operating System
No response
Question
Currently, the UserIdleController only responds to UserSpeakingFrame and BotSpeakingFrame. There is also special handling for FunctionCallsStartedFrame to prevent timeouts while waiting for the LLM. However, this feels like a band-aid solution, as there are gaps between these frames. For example, network latency or CPU overload could cause the UserIdleController to trigger a timeout unexpectedly.
I think that once we detect that a user turn has stopped, we should no longer expect user speech. This should also apply when a user is muted.
Should UserIdleController stop waiting for user speech once a user turn ends or the user is muted?
What I've tried
No response
Context
I encountered this issue while trying to implement a DTMF authentication processor. Consider the following scenario:
A user is muted from the start. They remain muted until they are authenticated via a set of DTMF tones. This is collected in custom Frame processor.
In the current version of Pipecat: This workflow is not possible without encountering the bugs described above, or without using non-standard approaches.
Normally, DTMF tones from DTMF aggregator would be filtered because of mute, I created a custom DTMF aggregator to handle this, but UserIdleController still triggers even when the user is muted because it does not register any user activity but also it does not know about user's mute.
Regardless of what i was trying to achieve I’d love to hear your thoughts on this matter.
pipecat version
0.0.101
Python version
No response
Operating System
No response
Question
Currently, the UserIdleController only responds to UserSpeakingFrame and BotSpeakingFrame. There is also special handling for FunctionCallsStartedFrame to prevent timeouts while waiting for the LLM. However, this feels like a band-aid solution, as there are gaps between these frames. For example, network latency or CPU overload could cause the UserIdleController to trigger a timeout unexpectedly.
I think that once we detect that a user turn has stopped, we should no longer expect user speech. This should also apply when a user is muted.
Should UserIdleController stop waiting for user speech once a user turn ends or the user is muted?
What I've tried
No response
Context
I encountered this issue while trying to implement a DTMF authentication processor. Consider the following scenario:
A user is muted from the start. They remain muted until they are authenticated via a set of DTMF tones. This is collected in custom Frame processor.
In the current version of Pipecat: This workflow is not possible without encountering the bugs described above, or without using non-standard approaches.
Normally, DTMF tones from DTMF aggregator would be filtered because of mute, I created a custom DTMF aggregator to handle this, but UserIdleController still triggers even when the user is muted because it does not register any user activity but also it does not know about user's mute.
Regardless of what i was trying to achieve I’d love to hear your thoughts on this matter.