-
Notifications
You must be signed in to change notification settings - Fork 0
12 operations testing and roadmap
github-actions[bot] edited this page Feb 20, 2026
·
1 revision
Primary startup flow:
bun run start
This runs preflight checks in scripts/start.ts before launching the full stack:
- required config files,
- LLM endpoint readiness and optional LM Studio bootstrap,
- integration branch/worktree safety,
- worker Docker image readiness,
- startup warmup job path.
Useful alternatives:
-
bun run dev:fullfor direct multi-service launch. - individual
*:onlyscripts for targeted debugging.
- Full stack with preflights:
bun run start
- Full stack without preflight wrapper:
bun run dev:full
- Integration harness:
bun run test:integration
- Eval harness:
bun run test:integration:eval
- VS Code extension package/lint:
bun run vscode:client:lintbun run vscode:client:package
Baseline tooling:
- Bun
- Python 3.12+
- Docker (for default worker flow)
- Git (and optionally GitHub CLI for PR workflows)
Where to look first:
- service terminal logs from
dev:fullorstart. - server queue snapshots (
/requests,/jobs,/completions). - WorkerPals logs and job logs in Server job log endpoints.
- integration logs from SourceControlManager.
For session behavior:
- inspect event stream (
/sessions/:id/eventswith cursor replay semantics).
When the system is "stuck", diagnose in this order:
- Server health and session event progression.
- Request queue movement.
- Job queue movement and worker heartbeat.
- Completion queue movement and SCM processing.
- Client transport/reconnect state.
- Unit/integration tests (TypeScript + Python harnesses).
- End-to-end integration harness:
tests/integration/integration_controller.pytests/integration/test_workerpals_e2e.py
- Eval scenarios:
tests/integration/eval_scenarios.swebench_like.json
The integration controller supports two modes:
-
integration: regular flow checks. -
eval: backend quality benchmark runs with scenario suites and budgets.
Pros:
- strong operational discipline and reproducibility,
- realistic benchmark path for backend quality.
Cons:
- setup complexity is higher than simple single-agent tools,
- Docker and multi-service orchestration increase local troubleshooting load.
- Observability
- distributed trace IDs across request/job/completion lifecycle,
- richer metrics and dashboards for latency, failure categories, and retries.
- Reliability
- dead-letter queues and replay tools,
- stronger backpressure and overload controls.
- DX
- one-command diagnostics report,
- clearer startup failure classification with remediation hints.
- Autonomy quality
- objective outcome attribution loops,
- model/prompt benchmark gating before production rollout.
- Platform hardening
- stricter schema evolution checks,
- stronger integration of policy checks into CI.