Skip to content

feat: Kubernetes container runtime for agent spawning #2354

@netadmincmh-hash

Description

@netadmincmh-hash

Use case

Spawn per-session agent containers as Kubernetes pods on a user-provided cluster instead of local Docker / Apple Container. Today src/container-runtime.ts exports CONTAINER_RUNTIME_BIN = 'docker' and src/container-runner.ts shells out to docker run; on a host that's already a K8s control-plane node with everything else running in-cluster, the local-Docker path is the odd one out — different lifecycle, different observability, different resource controls.

Concretely my host is swarm0, a control-plane node in a 5-node Cilium cluster. All other personal workloads (convertx, pi-hole, netbird, portainer, etc.) live in-cluster behind nginx-ingress + cert-manager DNS-01. The nanoclaw host process runs as root (it's a dedicated node), and /pods/nanoclaw-v2/data/v2-sessions/ is on NFS. I just finished a v1→v2 migration on this host and discovered a couple of papercuts that a K8s runtime would dissolve:

  • The host writes session DBs as uid 0; the agent image's USER node (uid 1000) cannot write outbound.db on NFS without an explicit chown step (filed separately as fix(session-manager): chown new session dirs when host runs as root #2353).
  • The credential proxy binds to docker0 and containers reach it via host.docker.internal — both Docker-specific concepts that don't apply in K8s.

What I think a runtime abstraction would need to handle

This is sketch-level — happy to refine if you want to take it on:

  1. Runtime selection — extend the existing Docker / Apple-Container split with a kubernetes mode (env-toggled, e.g. CONTAINER_RUNTIME=k8s). CONTAINER_RUNTIME_BIN becomes an abstract spawn function rather than a binary name.
  2. Pod template generation — replace args.push('-e', ...) / args.push('-v', ...) with PodSpec generation: env → env, volume mounts → volumes/volumeMounts, --usersecurityContext.runAsUser, --add-hosthostAliases, --rm → an OnFailure Pod (or a Job).
  3. Volumes — session dirs need to be readable+writable by both the host and the agent pod. NFS PVs / PVCs work if the cluster has a CSI driver matching the host's NFS export, or hostPath if the host process and pods land on the same node (DaemonSet-style affinity). Mount-allowlist entries in ~/.config/nanoclaw/mount-allowlist.json would translate to additional PVCs or hostPath volumes.
  4. Credential proxy — currently http://host.docker.internal:CREDENTIAL_PROXY_PORT. In K8s either a Service of type ClusterIP (proxy runs as a sidecar Deployment) or a host-network endpoint plus hostNetwork: true on agent pods.
  5. Heartbeat / DB visibility/workspace/.heartbeat is currently a host bind-mount the host process polls via fs.statSync. With pods, either keep the same NFS-backed path (works if you have a shared filesystem) or move heartbeat into outbound.db / a CRD / a watch on Pod conditions.
  6. Image buildcontainer/build.sh builds locally and tags nanoclaw-agent-v2-<slug>:latest. With K8s, that image needs to be pushed to a registry the cluster can pull from (most clusters can't pull from the local Docker daemon). I have an in-cluster registry namespace; users without one would need a setup-time prompt to point at Docker Hub / GHCR / etc.
  7. Orphan cleanupcleanupOrphans() scans docker ps --filter label=.... K8s equivalent: list Pods with the install-slug label and delete completed ones (or rely on Job.spec.ttlSecondsAfterFinished).
  8. Per-pod logs--rm wipes container logs on exit today, which already makes debugging hard (docs/... mentions this). K8s pods retain logs until garbage collection; using a non---rm model would actually be easier.

What I'd like out of this issue

Just to flag the use case and architectural shape. Not asking you to take it on — happy to do the work on a fork branch if you don't want it in trunk, but wanted to surface the design considerations first in case you've already thought about it or have opinions on the abstraction boundary.

Context

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions