Skip to content

feat(voice): energy-based Voice Activity Detector (#5896)#5976

Open
hurtdidit wants to merge 8 commits into
zeroclaw-labs:masterfrom
hurtdidit:feat/energy-vad
Open

feat(voice): energy-based Voice Activity Detector (#5896)#5976
hurtdidit wants to merge 8 commits into
zeroclaw-labs:masterfrom
hurtdidit:feat/energy-vad

Conversation

@hurtdidit
Copy link
Copy Markdown
Contributor

Summary

Validation Evidence (required)

All commands run locally on feat/energy-vad branch (commit ed6e4281):

cargo fmt --all -- --check  # passed (no output)
cargo clippy --locked --all-targets -- -D clippy::correctness  # passed (pre-existing warnings in other crates, zero new warnings)
cargo test  # 821+ tests passed

Key test output tails:

# zeroclaw-api (43 tests)
test vad::tests::energy_vad_custom_threshold ... ok
test vad::tests::energy_vad_custom_timeout ... ok
test vad::tests::energy_vad_full_cycle_start_silence_end ... ok
test vad::tests::energy_vad_no_speech_end_before_timeout ... ok
test vad::tests::energy_vad_resets_timeout_on_continuous_speech ... ok
test vad::tests::energy_vad_silence_on_quiet_input ... ok
test vad::tests::energy_vad_speech_end_after_timeout ... ok
test vad::tests::energy_vad_speech_start_again_after_end ... ok
test vad::tests::energy_vad_speech_start_on_loud_input ... ok
test vad::tests::noop_vad_always_silence ... ok
test vad::tests::rms_energy_* ... ok
test result: ok. 43 passed; 0 failed

# zeroclaw-gateway (138 tests)
test voice_duplex::tests::voice_duplex_session_from_config_custom ... ok
test voice_duplex::tests::voice_duplex_session_process_frame_detects_speech ... ok
test result: ok. 138 passed; 0 failed

# zeroclaw-config (553 tests)
test result: ok. 553 passed; 0 failed

# Full suite (lib + bin + component + integration + system)
test result: ok. 230 passed  (lib)
test result: ok. 222 passed  (bin)
test result: ok. 205 passed  (component)
test result: ok. 159 passed  (integration)
test result: ok. 5 passed    (system)
  • Beyond CI — what did you manually verify? Manually inspected state machine transitions: Silent→Speaking on loud input, Speaking→Silent after configurable timeout, brief pauses within speech do not prematurely emit SpeechEnd, full cycle (start→continue→end→restart) works correctly. Verified config deserialization with custom thresholds. Verified from_config() creates correct VAD instances.
  • If any command was intentionally skipped, why: None skipped.

Security & Privacy Impact (required)

  • New permissions, capabilities, or file system access scope? No
  • New external network calls? No
  • Secrets / tokens / credentials handling changed? No
  • PII, real identities, or personal data in diff, tests, fixtures, or docs? No

Compatibility (required)

  • Backward compatible? YesNoopVad is preserved, new fields have safe defaults via serde defaults, VoiceDuplexSession::new() still works.
  • Config / env / CLI surface changed? Yes — Two new optional config fields on [channels.voice_duplex]: vad_energy_threshold (default 0.01) and vad_silence_timeout_ms (default 500). Existing configs without these fields continue to work with defaults.
  • Exact upgrade steps for existing users: No action required. New fields are optional with sensible defaults.

Rollback (required for risk: medium and risk: high)

This is a low-risk PR. git revert <sha> is the plan.

  • Feature flags or config toggles: Entire change is gated by gateway-voice-duplex feature flag.

i18n Follow-Through

N.A. — No docs or user-facing wording changes.

Frank Hurt and others added 7 commits April 20, 2026 06:23
…plex feature flag (zeroclaw-labs#5896)

PR 1: Voice Event Protocol + Vad Trait + Feature Flag

- Add Vad trait and NoopVad to zeroclaw-api/src/vad.rs
- Add VoiceEvent enum (speech_start, speech_end, barge_in, tts_cancel, tts_chunk)
- Add VoiceDuplexConfig to zeroclaw-config schema with enabled flag
- Add gateway-voice-duplex feature flag to zeroclaw-gateway
- Add voice_duplex module with try_parse_voice_event and handle_voice_event
- Patch ws.rs message loop with dual-gate dispatch (compile + runtime)

Dual-gate ensures zero impact when feature is off at either level.
All events use text JSON frames - no binary frame changes needed.

Refs: zeroclaw-labs#5896
…als (zeroclaw-labs#5896)

CI fix: upstream clippy with --features ci-all caught 3 missing voice_duplex
fields in test struct literal constructions of ChannelsConfig.

Added voice_duplex: None to all ChannelsConfig initializers in schema.rs tests.
… gateway, add error frames, reorder config check (zeroclaw-labs#5896)

- Move VoiceEvent enum + serde roundtrip tests from zeroclaw-api/src/vad.rs
  to zeroclaw-gateway/src/voice_duplex.rs (RFC zeroclaw-labs#5574 §4.2 compliance)
- Vad trait, VadEvent, NoopVad remain in zeroclaw-api (correct placement)
- handle_voice_event now returns Option<serde_json::Value> error frame
  for server→client events received from client (invalid_event_direction)
- ws.rs: check runtime config before parsing voice events (avoid redundant
  JSON parse on every message when duplex compiled but not enabled)
- Add tests for error frame and no-error paths
…VAD pipeline (zeroclaw-labs#5896)

Extend WebSocket gateway to accept binary audio frames (PCM16 LE mono
16kHz) alongside text JSON events from PR 1.

voice_duplex.rs additions:
- audio module: PCM16 constants (sample rate, frame size limits)
- AudioFrameError enum with Display impl
- validate_pcm16_frame(): frame size validation (10ms-300ms)
- pcm16_to_f32(): PCM16 LE to normalised f32 conversion
- VoiceDuplexSession: per-session state with binary flag and Vad instance
- CAP_BINARY_AUDIO constant for capability negotiation
- 12 new unit tests (all passing)

ws.rs changes:
- Protocol negotiation in connect handshake (binary-audio capability)
- Server confirms capability in ack with audio_format metadata
- Message::Binary arm: validate → convert → feed to VAD pipeline
- Reject with clear error codes when duplex disabled or not negotiated
- All gated by #[cfg(feature = "gateway-voice-duplex")]
- Updated protocol documentation comment

Acceptance criteria met:
- Binary frames accepted when duplex enabled ✓
- Binary frames rejected when duplex disabled ✓
- No impact on existing text-only WebSocket clients ✓
- cargo check / cargo test pass ✓

Depends-on: PR zeroclaw-labs#5942 (voice event protocol + Vad trait)
…claw-labs#5896)

Split ack construction into cfg(feature) and cfg(not) blocks to avoid
unused mut warning. Prefix voice_session with underscore in non-feature
fallback path. No functional changes.
…eroclaw-labs#5896)

Implement EnergyVad using RMS amplitude analysis with configurable
threshold and silence timeout. Wire into VoiceDuplexSession via
VoiceDuplexConfig, replacing the placeholder NoopVad.

Changes:
- zeroclaw-api/vad.rs: Add EnergyVad struct with RMS-based state machine
  (Silent/Speaking) and SpeechStart/SpeechEnd transition detection.
  Add compute_rms_energy() utility. 11 new unit tests.
- zeroclaw-config/schema.rs: Add vad_energy_threshold (default 0.01)
  and vad_silence_timeout_ms (default 500) to VoiceDuplexConfig.
- zeroclaw-gateway/voice_duplex.rs: Add from_config() constructor,
  update new() to delegate. Update tests for EnergyVad behavior.
- zeroclaw-gateway/ws.rs: Pass VoiceDuplexConfig to session creation.

All 734 tests pass (api: 43, gateway: 138, config: 553).
Part of zeroclaw-labs#5896
@singlerider singlerider requested a review from Audacity88 April 29, 2026 08:17
@singlerider singlerider added enhancement New feature or request risk: high Auto risk: security/runtime/gateway/tools/workflows. size: XL Auto size: >1000 non-doc changed lines. channel Auto scope: src/channels/** changed. needs-author-action Author action required before merge labels Apr 29, 2026
@Audacity88
Copy link
Copy Markdown
Collaborator

Thanks for the PR. Before I do a full review, could you please update this branch against current master and resolve the merge conflicts?

GitHub currently reports the PR as dirty. The checks on the current head are green, which is helpful, but because this is a risk: high / size: XL voice-duplex change stacked on the related voice work, I would rather review it on a conflict-free head than spend the full pass on a stale merge state.

Once the branch is updated, please refresh the validation evidence in the PR body if anything changes during conflict resolution. The main checks I would expect to still be relevant are:

cargo fmt --all -- --check
cargo clippy --locked --all-targets -- -D clippy::correctness
cargo test

After that I can review the current energy-VAD / voice-duplex integration diff properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel Auto scope: src/channels/** changed. enhancement New feature or request needs-author-action Author action required before merge risk: high Auto risk: security/runtime/gateway/tools/workflows. size: XL Auto size: >1000 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants