feat(voice): energy-based Voice Activity Detector (#5896) by hurtdidit · Pull Request #5976 · zeroclaw-labs/zeroclaw

hurtdidit · 2026-04-21T18:21:26Z

Summary

Base branch: master (all contributions)
What changed and why:
- Implements EnergyVad, a real energy-based Voice Activity Detector using RMS amplitude analysis, replacing the placeholder NoopVad that always returned Silence.
- Adds vad_energy_threshold (default 0.01) and vad_silence_timeout_ms (default 500) configuration fields to VoiceDuplexConfig.
- Wires EnergyVad into VoiceDuplexSession via a new from_config() constructor, so VAD parameters are configurable per-instance.
- Updates ws.rs to pass config to session creation.
Scope boundary: Only replaces the VAD implementation — does not change the binary audio frame pipeline, voice event protocol, TTS/STT subsystems, or WebSocket connection handling.
Blast radius: Minimal. Changes are fully gated behind gateway-voice-duplex feature flag. Existing NoopVad is preserved for backward compatibility. New config fields have safe defaults.
Linked issue(s): Related [Feature]: Full-duplex voice conversation with barge-in support #5896, Depends on feat(voice): WebSocket binary audio frames with PCM16 validation and VAD pipeline (#5896) #5974 (stacked), Depends on feat(voice): add Vad trait, VoiceEvent protocol, and gateway-voice-duplex feature flag (#5896) #5942 (stacked)

Validation Evidence (required)

All commands run locally on feat/energy-vad branch (commit ed6e4281):

cargo fmt --all -- --check  # passed (no output)
cargo clippy --locked --all-targets -- -D clippy::correctness  # passed (pre-existing warnings in other crates, zero new warnings)
cargo test  # 821+ tests passed

Key test output tails:

# zeroclaw-api (43 tests)
test vad::tests::energy_vad_custom_threshold ... ok
test vad::tests::energy_vad_custom_timeout ... ok
test vad::tests::energy_vad_full_cycle_start_silence_end ... ok
test vad::tests::energy_vad_no_speech_end_before_timeout ... ok
test vad::tests::energy_vad_resets_timeout_on_continuous_speech ... ok
test vad::tests::energy_vad_silence_on_quiet_input ... ok
test vad::tests::energy_vad_speech_end_after_timeout ... ok
test vad::tests::energy_vad_speech_start_again_after_end ... ok
test vad::tests::energy_vad_speech_start_on_loud_input ... ok
test vad::tests::noop_vad_always_silence ... ok
test vad::tests::rms_energy_* ... ok
test result: ok. 43 passed; 0 failed

# zeroclaw-gateway (138 tests)
test voice_duplex::tests::voice_duplex_session_from_config_custom ... ok
test voice_duplex::tests::voice_duplex_session_process_frame_detects_speech ... ok
test result: ok. 138 passed; 0 failed

# zeroclaw-config (553 tests)
test result: ok. 553 passed; 0 failed

# Full suite (lib + bin + component + integration + system)
test result: ok. 230 passed  (lib)
test result: ok. 222 passed  (bin)
test result: ok. 205 passed  (component)
test result: ok. 159 passed  (integration)
test result: ok. 5 passed    (system)

Beyond CI — what did you manually verify? Manually inspected state machine transitions: Silent→Speaking on loud input, Speaking→Silent after configurable timeout, brief pauses within speech do not prematurely emit SpeechEnd, full cycle (start→continue→end→restart) works correctly. Verified config deserialization with custom thresholds. Verified from_config() creates correct VAD instances.
If any command was intentionally skipped, why: None skipped.

Security & Privacy Impact (required)

New permissions, capabilities, or file system access scope? No
New external network calls? No
Secrets / tokens / credentials handling changed? No
PII, real identities, or personal data in diff, tests, fixtures, or docs? No

Compatibility (required)

Backward compatible? Yes — NoopVad is preserved, new fields have safe defaults via serde defaults, VoiceDuplexSession::new() still works.
Config / env / CLI surface changed? Yes — Two new optional config fields on [channels.voice_duplex]: vad_energy_threshold (default 0.01) and vad_silence_timeout_ms (default 500). Existing configs without these fields continue to work with defaults.
Exact upgrade steps for existing users: No action required. New fields are optional with sensible defaults.

Rollback (required for `risk: medium` and `risk: high`)

This is a low-risk PR. git revert <sha> is the plan.

Feature flags or config toggles: Entire change is gated by gateway-voice-duplex feature flag.

i18n Follow-Through

N.A. — No docs or user-facing wording changes.

…plex feature flag (zeroclaw-labs#5896) PR 1: Voice Event Protocol + Vad Trait + Feature Flag - Add Vad trait and NoopVad to zeroclaw-api/src/vad.rs - Add VoiceEvent enum (speech_start, speech_end, barge_in, tts_cancel, tts_chunk) - Add VoiceDuplexConfig to zeroclaw-config schema with enabled flag - Add gateway-voice-duplex feature flag to zeroclaw-gateway - Add voice_duplex module with try_parse_voice_event and handle_voice_event - Patch ws.rs message loop with dual-gate dispatch (compile + runtime) Dual-gate ensures zero impact when feature is off at either level. All events use text JSON frames - no binary frame changes needed. Refs: zeroclaw-labs#5896

…als (zeroclaw-labs#5896) CI fix: upstream clippy with --features ci-all caught 3 missing voice_duplex fields in test struct literal constructions of ChannelsConfig. Added voice_duplex: None to all ChannelsConfig initializers in schema.rs tests.

… gateway, add error frames, reorder config check (zeroclaw-labs#5896) - Move VoiceEvent enum + serde roundtrip tests from zeroclaw-api/src/vad.rs to zeroclaw-gateway/src/voice_duplex.rs (RFC zeroclaw-labs#5574 §4.2 compliance) - Vad trait, VadEvent, NoopVad remain in zeroclaw-api (correct placement) - handle_voice_event now returns Option<serde_json::Value> error frame for server→client events received from client (invalid_event_direction) - ws.rs: check runtime config before parsing voice events (avoid redundant JSON parse on every message when duplex compiled but not enabled) - Add tests for error frame and no-error paths

…n type (zeroclaw-labs#5896)

…VAD pipeline (zeroclaw-labs#5896) Extend WebSocket gateway to accept binary audio frames (PCM16 LE mono 16kHz) alongside text JSON events from PR 1. voice_duplex.rs additions: - audio module: PCM16 constants (sample rate, frame size limits) - AudioFrameError enum with Display impl - validate_pcm16_frame(): frame size validation (10ms-300ms) - pcm16_to_f32(): PCM16 LE to normalised f32 conversion - VoiceDuplexSession: per-session state with binary flag and Vad instance - CAP_BINARY_AUDIO constant for capability negotiation - 12 new unit tests (all passing) ws.rs changes: - Protocol negotiation in connect handshake (binary-audio capability) - Server confirms capability in ack with audio_format metadata - Message::Binary arm: validate → convert → feed to VAD pipeline - Reject with clear error codes when duplex disabled or not negotiated - All gated by #[cfg(feature = "gateway-voice-duplex")] - Updated protocol documentation comment Acceptance criteria met: - Binary frames accepted when duplex enabled ✓ - Binary frames rejected when duplex disabled ✓ - No impact on existing text-only WebSocket clients ✓ - cargo check / cargo test pass ✓ Depends-on: PR zeroclaw-labs#5942 (voice event protocol + Vad trait)

…claw-labs#5896) Split ack construction into cfg(feature) and cfg(not) blocks to avoid unused mut warning. Prefix voice_session with underscore in non-feature fallback path. No functional changes.

…eroclaw-labs#5896) Implement EnergyVad using RMS amplitude analysis with configurable threshold and silence timeout. Wire into VoiceDuplexSession via VoiceDuplexConfig, replacing the placeholder NoopVad. Changes: - zeroclaw-api/vad.rs: Add EnergyVad struct with RMS-based state machine (Silent/Speaking) and SpeechStart/SpeechEnd transition detection. Add compute_rms_energy() utility. 11 new unit tests. - zeroclaw-config/schema.rs: Add vad_energy_threshold (default 0.01) and vad_silence_timeout_ms (default 500) to VoiceDuplexConfig. - zeroclaw-gateway/voice_duplex.rs: Add from_config() constructor, update new() to delegate. Update tests for EnergyVad behavior. - zeroclaw-gateway/ws.rs: Pass VoiceDuplexConfig to session creation. All 734 tests pass (api: 43, gateway: 138, config: 553). Part of zeroclaw-labs#5896

Audacity88 · 2026-05-01T17:42:49Z

Thanks for the PR. Before I do a full review, could you please update this branch against current master and resolve the merge conflicts?

GitHub currently reports the PR as dirty. The checks on the current head are green, which is helpful, but because this is a risk: high / size: XL voice-duplex change stacked on the related voice work, I would rather review it on a conflict-free head than spend the full pass on a stale merge state.

Once the branch is updated, please refresh the validation evidence in the PR body if anything changes during conflict resolution. The main checks I would expect to still be relevant are:

cargo fmt --all -- --check
cargo clippy --locked --all-targets -- -D clippy::correctness
cargo test

After that I can review the current energy-VAD / voice-duplex integration diff properly.

Frank Hurt and others added 7 commits April 20, 2026 06:23

docs(voice): fix handle_voice_event doc comment to match Option retur…

7e2c54f

…n type (zeroclaw-labs#5896)

fix(voice): resolve clippy warnings for strict CI (-D warnings) (zero…

ec3857c

…claw-labs#5896) Split ack construction into cfg(feature) and cfg(not) blocks to avoid unused mut warning. Prefix voice_session with underscore in non-feature fallback path. No functional changes.

hurtdidit requested review from JordanTheJet and theonlyhennygod as code owners April 21, 2026 18:21

theonlyhennygod mentioned this pull request Apr 22, 2026

feat(voice): add Vad trait, VoiceEvent protocol, and gateway-voice-duplex feature flag (#5896) #5942

Merged

Merge branch 'master' into feat/energy-vad

548183c

github-actions Bot mentioned this pull request Apr 24, 2026

🦞 OpenClaw 生态日报 2026-04-24 gsscsd/big_model_radar#236

Open

singlerider requested a review from Audacity88 April 29, 2026 08:17

singlerider added enhancement New feature or request risk: high Auto risk: security/runtime/gateway/tools/workflows. size: XL Auto size: >1000 non-doc changed lines. channel Auto scope: src/channels/** changed. needs-author-action Author action required before merge labels Apr 29, 2026

singlerider mentioned this pull request Apr 29, 2026

feat(voice): speech capture buffer + STT dispatch (#5896) #5978

Open

theonlyhennygod added this to the v0.7.7 milestone May 3, 2026

theonlyhennygod mentioned this pull request May 3, 2026

[Feature]: Track: v0.7.7 — Desktop app (Tauri) parity, menu bar, macOS accessibility #6343

Open

36 tasks

This was referenced May 4, 2026

🦞 OpenClaw 生态日报 2026-05-04 gsscsd/big_model_radar#291

Open

🦞 OpenClaw 生态日报 2026-05-04 borq168/big_model_radar#130

Open

🦞 OpenClaw Ecosystem Digest 2026-05-04 borq168/big_model_radar#133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): energy-based Voice Activity Detector (#5896)#5976

feat(voice): energy-based Voice Activity Detector (#5896)#5976
hurtdidit wants to merge 8 commits into
zeroclaw-labs:masterfrom
hurtdidit:feat/energy-vad

hurtdidit commented Apr 21, 2026

Uh oh!

Audacity88 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hurtdidit commented Apr 21, 2026

Summary

Validation Evidence (required)

Security & Privacy Impact (required)

Compatibility (required)

Rollback (required for risk: medium and risk: high)

i18n Follow-Through

Uh oh!

Audacity88 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rollback (required for `risk: medium` and `risk: high`)