feat(voice): energy-based Voice Activity Detector (#5896)#5976
Open
hurtdidit wants to merge 8 commits into
Open
feat(voice): energy-based Voice Activity Detector (#5896)#5976hurtdidit wants to merge 8 commits into
hurtdidit wants to merge 8 commits into
Conversation
…plex feature flag (zeroclaw-labs#5896) PR 1: Voice Event Protocol + Vad Trait + Feature Flag - Add Vad trait and NoopVad to zeroclaw-api/src/vad.rs - Add VoiceEvent enum (speech_start, speech_end, barge_in, tts_cancel, tts_chunk) - Add VoiceDuplexConfig to zeroclaw-config schema with enabled flag - Add gateway-voice-duplex feature flag to zeroclaw-gateway - Add voice_duplex module with try_parse_voice_event and handle_voice_event - Patch ws.rs message loop with dual-gate dispatch (compile + runtime) Dual-gate ensures zero impact when feature is off at either level. All events use text JSON frames - no binary frame changes needed. Refs: zeroclaw-labs#5896
…als (zeroclaw-labs#5896) CI fix: upstream clippy with --features ci-all caught 3 missing voice_duplex fields in test struct literal constructions of ChannelsConfig. Added voice_duplex: None to all ChannelsConfig initializers in schema.rs tests.
… gateway, add error frames, reorder config check (zeroclaw-labs#5896) - Move VoiceEvent enum + serde roundtrip tests from zeroclaw-api/src/vad.rs to zeroclaw-gateway/src/voice_duplex.rs (RFC zeroclaw-labs#5574 §4.2 compliance) - Vad trait, VadEvent, NoopVad remain in zeroclaw-api (correct placement) - handle_voice_event now returns Option<serde_json::Value> error frame for server→client events received from client (invalid_event_direction) - ws.rs: check runtime config before parsing voice events (avoid redundant JSON parse on every message when duplex compiled but not enabled) - Add tests for error frame and no-error paths
…VAD pipeline (zeroclaw-labs#5896) Extend WebSocket gateway to accept binary audio frames (PCM16 LE mono 16kHz) alongside text JSON events from PR 1. voice_duplex.rs additions: - audio module: PCM16 constants (sample rate, frame size limits) - AudioFrameError enum with Display impl - validate_pcm16_frame(): frame size validation (10ms-300ms) - pcm16_to_f32(): PCM16 LE to normalised f32 conversion - VoiceDuplexSession: per-session state with binary flag and Vad instance - CAP_BINARY_AUDIO constant for capability negotiation - 12 new unit tests (all passing) ws.rs changes: - Protocol negotiation in connect handshake (binary-audio capability) - Server confirms capability in ack with audio_format metadata - Message::Binary arm: validate → convert → feed to VAD pipeline - Reject with clear error codes when duplex disabled or not negotiated - All gated by #[cfg(feature = "gateway-voice-duplex")] - Updated protocol documentation comment Acceptance criteria met: - Binary frames accepted when duplex enabled ✓ - Binary frames rejected when duplex disabled ✓ - No impact on existing text-only WebSocket clients ✓ - cargo check / cargo test pass ✓ Depends-on: PR zeroclaw-labs#5942 (voice event protocol + Vad trait)
…claw-labs#5896) Split ack construction into cfg(feature) and cfg(not) blocks to avoid unused mut warning. Prefix voice_session with underscore in non-feature fallback path. No functional changes.
…eroclaw-labs#5896) Implement EnergyVad using RMS amplitude analysis with configurable threshold and silence timeout. Wire into VoiceDuplexSession via VoiceDuplexConfig, replacing the placeholder NoopVad. Changes: - zeroclaw-api/vad.rs: Add EnergyVad struct with RMS-based state machine (Silent/Speaking) and SpeechStart/SpeechEnd transition detection. Add compute_rms_energy() utility. 11 new unit tests. - zeroclaw-config/schema.rs: Add vad_energy_threshold (default 0.01) and vad_silence_timeout_ms (default 500) to VoiceDuplexConfig. - zeroclaw-gateway/voice_duplex.rs: Add from_config() constructor, update new() to delegate. Update tests for EnergyVad behavior. - zeroclaw-gateway/ws.rs: Pass VoiceDuplexConfig to session creation. All 734 tests pass (api: 43, gateway: 138, config: 553). Part of zeroclaw-labs#5896
Collaborator
|
Thanks for the PR. Before I do a full review, could you please update this branch against current GitHub currently reports the PR as dirty. The checks on the current head are green, which is helpful, but because this is a Once the branch is updated, please refresh the validation evidence in the PR body if anything changes during conflict resolution. The main checks I would expect to still be relevant are: cargo fmt --all -- --check
cargo clippy --locked --all-targets -- -D clippy::correctness
cargo testAfter that I can review the current energy-VAD / voice-duplex integration diff properly. |
36 tasks
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
master(all contributions)EnergyVad, a real energy-based Voice Activity Detector using RMS amplitude analysis, replacing the placeholderNoopVadthat always returnedSilence.vad_energy_threshold(default 0.01) andvad_silence_timeout_ms(default 500) configuration fields toVoiceDuplexConfig.EnergyVadintoVoiceDuplexSessionvia a newfrom_config()constructor, so VAD parameters are configurable per-instance.ws.rsto pass config to session creation.gateway-voice-duplexfeature flag. ExistingNoopVadis preserved for backward compatibility. New config fields have safe defaults.Validation Evidence (required)
All commands run locally on
feat/energy-vadbranch (commited6e4281):Key test output tails:
from_config()creates correct VAD instances.Security & Privacy Impact (required)
Compatibility (required)
NoopVadis preserved, new fields have safe defaults via serde defaults,VoiceDuplexSession::new()still works.[channels.voice_duplex]:vad_energy_threshold(default 0.01) andvad_silence_timeout_ms(default 500). Existing configs without these fields continue to work with defaults.Rollback (required for
risk: mediumandrisk: high)This is a low-risk PR.
git revert <sha>is the plan.gateway-voice-duplexfeature flag.i18n Follow-Through
N.A. — No docs or user-facing wording changes.