Commit 447708f
committed
feat(realtime): EOU-driven semantic_vad turn detection
Add a `semantic_vad` turn-detection mode to the realtime API that feeds
the transcription model live and decides "the user finished speaking"
from the `<EOU>` end-of-utterance token rather than from silence alone.
When EOU fires the turn commits immediately (~0.3s); otherwise it falls
back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s).
Plumbing, bottom to top:
- proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof,
mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus
`TranscriptResult.eou` for the unary retranscribe gate.
- pkg/grpc: client/server/base/embed scaffolding for the bidi stream,
modeled on AudioTransformStream; release stream conns on terminal Recv.
- parakeet-cpp: live transcription RPC with per-C-call engine locking
(one live stream per turn, finalize+free at commit); bump parakeet.cpp
to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel
recompute that delayed EOU on long turns) and the <EOU>/<EOB> split;
strip the literal <EOU>/<EOB> from offline text and set Eou.
- core/backend: LiveTranscriptionSession wrapper + pipeline
`turn_detection:` config block (type/eagerness/retranscribe).
- realtime: semantic_vad integration — live input captions streamed as
transcription deltas while the user speaks, EOU-immediate commit with
eagerness fallback, optional retranscribe gate (batch re-decode must
also end in <EOU> to confirm), clause synthesis off the LLM token
callback, and per-turn live-transcription / model_load telemetry.
- UI: show the realtime pipeline components as a vertical list.
Docs and tests included; opt-in via the pipeline YAML or per-session
`session.update`. Non-streaming STT backends degrade to silence-only.
Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash]
Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>1 parent 62c99c1 commit 447708f
49 files changed
Lines changed: 4107 additions & 255 deletions
File tree
- backend
- go/parakeet-cpp
- core
- application
- backend
- config
- meta
- http
- endpoints/openai
- react-ui
- e2e
- src/pages
- schema
- services/nodes
- trace
- docs/content/features
- pkg
- grpc
- base
- grpcerrors
- model
- sound
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
21 | 33 | | |
22 | 34 | | |
23 | 35 | | |
| |||
479 | 491 | | |
480 | 492 | | |
481 | 493 | | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
482 | 498 | | |
483 | 499 | | |
484 | 500 | | |
485 | 501 | | |
486 | 502 | | |
487 | 503 | | |
488 | 504 | | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
489 | 533 | | |
490 | 534 | | |
491 | 535 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
18 | 22 | | |
19 | 23 | | |
20 | 24 | | |
| |||
0 commit comments