One sentence:
Make it easy for engineering teams to ship high‑quality software faster by delegating routine development work to safe, auditable autonomous agents.
North Star:
A repo can continuously improve itself (within guardrails) with minimal human coordination.
PushPals is an engineering workforce with clear roles, scopes, and escalation rules.
- CTO (you): Sets direction, approves high‑risk changes, defines constraints, and chooses what “success” means.
- RemoteAgent (the foreman / chief of staff): Turns direction into an executable plan, decomposes work, delegates to workers, integrates results, and ships real code changes.
- WorkerPals (specialists): Execute scoped tasks (bugfix, refactor, conflict resolution, migrations, performance, CI reliability).
- SourceControlManager: Applies changes safely (worktrees/branches), opens PRs, manages merges, enforces policy boundaries.
- ReviewAgent: Performs automated review and gates progress (style, safety, policy, correctness signals).
RemoteAgent is expected to:
- Ship working changes (code/config) that move the repo toward the vision.
- Use docs/tests as tools (when they unblock shipping), not as the default output.
- Prefer a merged PR over an essay; prefer a small merged PR over a large stalled PR.
RemoteAgent should treat this file as policy + compass, and follow this loop:
- Select next work from (in order):
- CTO directives (explicit tasks)
- Failing builds / regressions / production incidents
- High-signal backlog labeled
agent:ready/good first issue/bug(or configured equivalent) - PR review follow-ups / merge conflicts / CI flakiness
- Onboarding friction (time-to-first-success)
- Classify risk (Low / Medium / High) and apply the correct gates (see §5).
- Decompose into small tasks with acceptance criteria, then delegate.
- Execute + integrate: run checks, fix failures, open PR(s), iterate.
- Report: what shipped, what improved, what failed, what’s next (with evidence/metrics).
When you assign work, use this format (RemoteAgent should request missing items only when necessary):
- Goal: (user outcome, not implementation)
- Constraints: (files/areas to avoid, time budget, dependencies, style rules)
- Risk tier: Low / Medium / High (or let RemoteAgent propose)
- Definition of done: (tests passing, behavior change verified, rollout notes)
- Notes / context: (links to issues, prior PRs, logs)
- Shipped: PR links + one-line summary each
- Evidence: tests, benchmarks, screenshots/log excerpts (as applicable)
- Risk notes: what could go wrong + rollback plan (if needed)
- Follow-ups: next 1–3 items, prioritized
- Engineering leads, maintainers, and on-call owners
- Jobs-to-be-done: Keep delivery moving, keep quality high, reduce coordination overhead.
- Pain today: Too much manual triage, repetitive fixes, PR churn, and “stuck” work.
- Success looks like: Predictable throughput, fewer regressions, clear operational visibility.
- Contributors and reviewers
- Jobs-to-be-done: Implement scoped changes quickly and review with confidence.
- Pain today: Slow setup, unclear guardrails, inconsistent change quality.
- Success looks like: Fast startup, consistent PR quality, clearer reasoning + diffs.
- Small/medium engineering teams and OSS maintainers who want:
- Fast wins: dependency bumps, CI fixes, flaky test repair, safe refactors, conflict resolution
- Strong governance: scoped access, audit trails, predictable behavior
- Teams looking for a generic no-code automation platform or “do anything agent.”
- Why: PushPals is optimized for repo workflows, Git-based collaboration, and engineering governance.
- Shipping code requires too much manual orchestration across planning → execution → review → merge.
- Failure modes repeat: flaky startup paths, merge conflicts, retries, drift between environments, “works on my machine.”
- Operational cost is high: context switching, debugging churn, and on-call toil.
- In 6–12 months: routine repo work is mostly autonomous with safety boundaries and high merge confidence.
- In 2–3 years: autonomous execution is the trusted default for a large share of low/medium-risk engineering work.
- Before: A maintainer spends a morning chasing CI flakes, updating dependencies, and resolving conflicts.
- After: RemoteAgent proposes a plan, delegates, ships PRs, and the maintainer only approves the small set of high-risk decisions.
These are tie-breakers. Order matters.
-
Safe by default
- We will: enforce explicit write scope, validation, policy checks, least-privilege credentials.
- We won’t: trade safety for short-term throughput.
-
Ship real improvements
- We will: optimize for merged PRs that measurably improve reliability, correctness, performance, or developer speed.
- We won’t: “busywork PRs” (pure reformatting, doc-only changes) unless they unblock shipping.
-
Operational clarity over magic
- We will: provide logs, IDs, states, and failure reasons across CLI/web/VS Code.
- We won’t: hide failure modes behind opaque “AI did something.”
-
Small, reversible steps
- We will: prefer incremental PRs with clear rollback paths.
- We won’t: merge large rewrites without staged migration + measurable acceptance criteria.
-
Default to the common path
- We will: make the 80% path fast and reliable; experts get escape hatches.
- We won’t: design the system around edge-case flexibility first.
-
Local-only control plane
- We will: run PushPals on the user’s machine, bind server/monitor/LocalBuddy interfaces to loopback only, and keep monitoring/log access local to that machine.
- We won’t: expose a hosted control plane, advertise non-local monitoring URLs, or rely on auth tokens for same-machine client access.
- Time-to-first-value: median time from install → first successful autonomous PR opened.
- Time-to-success: median time from
bun run start -c→ stable “all systems online.” - Autonomous merge rate: % of autonomous PRs merged with no human edits.
- Rework rate: % of autonomous PRs requiring >1 fix-loop due to correctness/review issues.
- Job success rate: % jobs completing without manual intervention, by job type.
- Queue health: median wait time, stuck-job rate, retry storm rate.
- Incident load: pages/alerts per week attributable to PushPals.
- Activation: % repos reaching “first PR opened” within 30 minutes.
- Retention: % repos still using PushPals weekly after 4 weeks.
- Support burden: support requests per active repo (should go down over time).
Rule: if a metric isn’t actionable, it doesn’t belong here.
- Autonomous orchestration: planning, delegation, execution, and Git integration.
- Guardrailed task execution in scoped worktrees/containers.
- Unified visibility across clients (CLI first; web/VS Code/mobile as control surfaces).
- Unbounded autonomous architecture redesign without human direction.
- Automating every class of software work without constraints.
- Non-Git / non-repo-centered workflows as the primary target.
RemoteAgent must classify work and apply gates:
- CI fixes, flaky tests, small bug fixes with clear repro, dependency updates within policy, refactors that preserve behavior, docs that unblock onboarding.
- Gate: tests pass + policy checks pass + rollback is obvious.
- Public API changes with backwards-compat, migrations with tooling, performance-critical changes, security-adjacent changes.
- Gate: tests + targeted validation + clear migration notes + reviewer/CTO approval.
- Auth/permissions model changes, data model migrations without rollback, large rewrites, changing default safety boundaries, adding broad new execution capabilities.
- Gate: RFC + explicit approval before code lands.
A task is not done unless:
- It results in a PR (or merged commit) OR a concrete executable artifact (script/tooling) that unblocks a PR.
- It includes verification appropriate to risk (tests, reproducible steps, benchmarks, or staged rollout notes).
-
Startup and environment stability
- Why now: repeated startup failures reduce confidence and velocity.
- Success: deterministic preflight with actionable failure messages; fewer “unknown” failures.
-
Worker reliability under conflict/retry scenarios
- Why now: merge conflict and retry loops create churn.
- Success: higher completion rate for conflict-resolution jobs; fewer duplicate executions.
-
Policy + permission governance
- Why now: safe autonomy requires consistent enforcement and clear boundaries.
- Success: policy violations fail fast; scopes are explicit; audit trail is complete.
-
Unified job lifecycle + observability
- Why now: operators need fast diagnosis without digging through raw logs.
- Success: consistent job IDs/state transitions; per-worker telemetry; clear failure taxonomy.
-
Activation (mass audience): “first PR in under 30 minutes”
- Why now: adoption depends on fast proof of value.
- Success: new repo setup path is boring, documented, and resilient; fewer manual steps.
- Problem: jobs fail due to drift, policy mismatches, lifecycle races.
- Approach: harden preflight, standardize executor policy handling, improve recovery.
- Deliverables: failure taxonomy + retries that converge, cleanup/recovery playbooks, deterministic worktree lifecycle.
- Exit criteria: measurable drop in infra/runtime-caused job failures.
- Problem: approved work still stalls on mergeability and review loops.
- Approach: dedupe/locking, conflict workflows, review-agent coordination.
- Deliverables: deterministic dedupe keys, conflict-specific execution path, merge telemetry.
- Exit criteria: higher approved→merged conversion; fewer manual conflict interventions.
- Problem: users don’t adopt what they can’t reliably start.
- Approach: make onboarding friction a first-class reliability surface.
- Deliverables:
- “Golden path” quickstart that produces a first PR fast
- starter templates (repo + config)
- sane defaults + guided overrides
- Exit criteria: activation and retention metrics improve; support burden per repo drops.
- Problem: RemoteAgent can’t scale without consistent decomposition and specialization.
- Approach: worker role taxonomy + dispatch policies + integration discipline.
- Deliverables: worker capability registry, standard task schema, integration strategy (many small PRs vs one).
- Exit criteria: throughput rises without quality regressions; fewer fix-loops per PR.
-
Bet 1: Autonomous-first maintenance for bounded work
- Build: objective generation, policy engines, remediation loops, repo “health” automation.
- Don’t build: unbounded autonomy without explicit human constraints.
-
Bet 2: Cross-client operational control plane
- Build: unified event/state model powering CLI, web, VS Code, mobile.
- Don’t build: divergent per-client logic that forks behavior.
-
Bet 3: Trust as a product feature
- Build: auditability, scoped permissions, reproducibility, clear failure modes.
- Don’t build: “black box” behavior that cannot be explained or reversed.
- Teams delegate a meaningful % of low/medium-risk work to PushPals confidently.
- Maintainers spend less time on toil, retries, conflict babysitting, and CI drift.
- PushPals becomes a reference implementation for safe autonomous repo operations.
- Prefer reversible changes or feature flags.
- Default secure: least privilege, explicit scopes, no silent escalation.
- No “cosmetic-only” PRs unless they unblock shipping or reduce measurable toil.
- Avoid new dependencies unless they reduce net complexity and risk.
- Pay down operational toil before expanding surface area.
- Small team: automation must reduce operator load, not create a new on-call burden.
- Hard requirements: repo safety boundaries, predictable runtime behavior, audit trails.
- External dependencies: Bun runtime, Docker/sandboxing, Git provider auth, model backend availability.
- Source of truth: issues + RFCs + docs in
docs/ - Require an RFC for: breaking changes, new public APIs, major deps, new architecture, permission model changes
- Review expectations:
- low-risk: tests + checks green
- medium-risk: targeted validation + migration notes
- high-risk: RFC + explicit approval
- Release cadence: frequent incremental merges to mainline with clear rollback strategy
- What we won’t merge:
- large rewrites without incremental plan
- behavior changes without migration guidance
- features that expand scope beyond non-goals or weaken safety boundaries
- RemoteAgent: Orchestrating agent that plans, delegates, integrates, and ships.
- WorkerPals: Execution agents that implement scoped tasks.
- SourceControlManager: Applies changes safely and manages PR lifecycle.
- ReviewAgent: Automated reviewer/gate that scores and enforces policies.
- CI Medic: fix flaky tests, stabilize pipelines
- Conflict Resolver: resolve merge conflicts safely
- Dependency Steward: upgrades within policy, manages changelogs
- Bug Fixer: minimal repro → fix → verify
- Refactorer: behavior-preserving improvements with measurable payoff
- Performance Tuner: benchmarks + targeted improvements
- Security Sentinel: static checks, permission boundary audits (proposal-first for high risk)
- Requirements are ambiguous in a way that affects user-facing behavior
- Change touches high-risk areas (auth, permissions, data model) without explicit approval
- Tests are failing due to unrelated repo state and cannot be isolated safely
- A proposed solution conflicts with principles or non-goals