fix(observability): restore broken SSE /logs stream; add build-stamped version and health pulse for remote/Docker deployments#6553
Open
WareWolf-MoonWall wants to merge 1 commit into
Conversation
c6b6f65 to
33fa707
Compare
…d version and health pulse for remote/Docker deployments
The gateway's /logs page has been functionally broken since it was built.
The SSE connection itself opened successfully (showing the green 'Connected'
indicator), but no agent events ever appeared. Every agent entry-point —
process_message, agent::run(), the cron scheduler, and the heartbeat worker
— each called create_observer() internally and discarded the result, so the
BroadcastObserver wired to the SSE bus was never reached. 'Waiting for
events...' was the permanent state for all users.
This is especially impactful for remote and Docker deployments where the
terminal is inaccessible: the WebUI and its configured channels are the
only windows into runtime behaviour. A permanently empty /logs page, no
version indicator, and no proof-of-life signal between agent turns creates
a serious trust and UX gap.
What this PR fixes and adds:
FIXES (broken behaviour):
- Wire broadcast observer through process_message() — gateway webhook/WS
agent activity now reaches /logs
- Introduce SseBroadcastObserver in zeroclaw-runtime so cron scheduler and
heartbeat worker can emit to SSE without importing gateway types
- Wire broadcast observer through agent::run() — cron-triggered and
heartbeat-triggered agent runs now reach /logs
- Extend BroadcastObserver to forward HeartbeatTick, ChannelMessage,
TurnComplete, and LlmResponse (previously silently dropped by _ => return)
ADDS (observability for remote/Docker users):
- 30-second health pulse task in run_gateway broadcasts {type:"pulse",
uptime_seconds, components} so the dashboard shows the daemon is alive
even when the agent is idle — replaces silence with a proof-of-life signal
- gateway_started, daemon_reload, daemon_shutdown, and component_restart
events so daemon lifecycle is visible without terminal access
- version field in /api/status composed from CARGO_PKG_VERSION + git SHA
+ dirty flag (captured at build time via build.rs) — shows as
'0.7.5 (771cbbc)' for source builds, '0.7.5 (771cbbc, dirty)' for
dirty builds, and plain '0.7.5' when git is unavailable (tarball/CI)
- Version status card in Dashboard Overview tab (5th in the grid)
- Logs.tsx banner updated to accurately describe what now flows over SSE
API surface changes (all workspace-internal, publish = false):
- execute_job_now gains Option<Arc<dyn Observer>> parameter; gateway caller updated
- agent::run() gains observer: Option<Arc<dyn Observer>> parameter; all five
call sites updated (None for CLI and integration tests)
33fa707 to
d9801ae
Compare
Collaborator
Author
|
CI note: The |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
master/logspage in the gateway WebUI has been functionally broken since it was built. The SSE connection itself opened successfully (the browser shows the green “Connected” indicator), but no agent events ever appeared. Every agent entry-point —process_message,agent::run(), the cron scheduler, and the heartbeat worker — each calledcreate_observer()internally and discarded the result, so theBroadcastObserverwired to the SSE bus was never reached. “Waiting for events…” was the permanent state for all users./logspage, no version indicator, and no proof-of-life signal between agent turns is a significant trust and UX gap.execute_job_now(public API): gains a thirdOption<Arc<dyn Observer>>parameter — breaking for external callers, but this crate ispublish = falseand the only workspace caller (gatewayapi.rs) is updated in this PR.agent::run()(public API): same — gains anobserverparameter; all five workspace call sites updated (Nonefor CLI and integration tests,Some(observer)for cron and heartbeat).pulse,gateway_started,daemon_*,component_restart,heartbeat_tick,channel_message,turn_complete,llm_response); unknown types fall through gracefully in the frontend.spawn_component_supervisor: gains anevent_txparameter — private function, zero external blast radius.risk: high,gateway,observability,agent,daemon,cron,heartbeatValidation Evidence (required)
cargo fmt --all -- --check cargo clippy --all-targets -- -D warnings cargo testcargo fmt --all -- --check: ✅ clean exit, no outputcargo clippy --all-targets -- -D warnings: ✅Finished dev profile [unoptimized + debuginfo] target(s) in 1m 42s— zero warnings, zero errorscargo test: ✅ all tests pass includingscheduled_no_conversation_leak_5415integration test which exercises the changedagent::run()signaturenpx tsc --noEmit(web/): ✅ zero TypeScript errorscargo check -p zeroclaw-gatewayafter addingemit_git_info()tobuild.rs: ✅ compiles in 2.85sagent::run()call sites and the full cron scheduler chain (run()→catch_up_overdue_jobs→process_due_jobs→execute_and_persist_job→execute_job_with_retry→run_agent_job→agent::run()). Confirmedi18n.tsdiff is exactly one line ('dashboard.version': 'Version') — no collateral formatting churn.Security & Privacy Impact (required)
CARGO_PKG_VERSION+ git SHA + dirty flag, all build-time constantsCompatibility (required)
execute_job_nowandagent::run()signatures changed. Both crates arepublish = false; all workspace callers are updated in this PR./api/statusgains an additiveversionfield; SSE gains new event types. Both are backwards-compatible at the consumer level. Custom SSE clients that consume/api/eventswill receive new event types they can safely ignore.Rollback (required for
risk: mediumandrisk: high)git revert 866861b5e/logsreturns to “Waiting for events…” permanently; version card shows—;pulseevents stop; cron/heartbeat agent activity disappears from the stream.