You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spawn per-session agent containers as Kubernetes pods on a user-provided cluster instead of local Docker / Apple Container. Today src/container-runtime.ts exports CONTAINER_RUNTIME_BIN = 'docker' and src/container-runner.ts shells out to docker run; on a host that's already a K8s control-plane node with everything else running in-cluster, the local-Docker path is the odd one out — different lifecycle, different observability, different resource controls.
Concretely my host is swarm0, a control-plane node in a 5-node Cilium cluster. All other personal workloads (convertx, pi-hole, netbird, portainer, etc.) live in-cluster behind nginx-ingress + cert-manager DNS-01. The nanoclaw host process runs as root (it's a dedicated node), and /pods/nanoclaw-v2/data/v2-sessions/ is on NFS. I just finished a v1→v2 migration on this host and discovered a couple of papercuts that a K8s runtime would dissolve:
The credential proxy binds to docker0 and containers reach it via host.docker.internal — both Docker-specific concepts that don't apply in K8s.
What I think a runtime abstraction would need to handle
This is sketch-level — happy to refine if you want to take it on:
Runtime selection — extend the existing Docker / Apple-Container split with a kubernetes mode (env-toggled, e.g. CONTAINER_RUNTIME=k8s). CONTAINER_RUNTIME_BIN becomes an abstract spawn function rather than a binary name.
Pod template generation — replace args.push('-e', ...) / args.push('-v', ...) with PodSpec generation: env → env, volume mounts → volumes/volumeMounts, --user → securityContext.runAsUser, --add-host → hostAliases, --rm → an OnFailure Pod (or a Job).
Volumes — session dirs need to be readable+writable by both the host and the agent pod. NFS PVs / PVCs work if the cluster has a CSI driver matching the host's NFS export, or hostPath if the host process and pods land on the same node (DaemonSet-style affinity). Mount-allowlist entries in ~/.config/nanoclaw/mount-allowlist.json would translate to additional PVCs or hostPath volumes.
Credential proxy — currently http://host.docker.internal:CREDENTIAL_PROXY_PORT. In K8s either a Service of type ClusterIP (proxy runs as a sidecar Deployment) or a host-network endpoint plus hostNetwork: true on agent pods.
Heartbeat / DB visibility — /workspace/.heartbeat is currently a host bind-mount the host process polls via fs.statSync. With pods, either keep the same NFS-backed path (works if you have a shared filesystem) or move heartbeat into outbound.db / a CRD / a watch on Pod conditions.
Image build — container/build.sh builds locally and tags nanoclaw-agent-v2-<slug>:latest. With K8s, that image needs to be pushed to a registry the cluster can pull from (most clusters can't pull from the local Docker daemon). I have an in-cluster registry namespace; users without one would need a setup-time prompt to point at Docker Hub / GHCR / etc.
Orphan cleanup — cleanupOrphans() scans docker ps --filter label=.... K8s equivalent: list Pods with the install-slug label and delete completed ones (or rely on Job.spec.ttlSecondsAfterFinished).
Per-pod logs — --rm wipes container logs on exit today, which already makes debugging hard (docs/... mentions this). K8s pods retain logs until garbage collection; using a non---rm model would actually be easier.
What I'd like out of this issue
Just to flag the use case and architectural shape. Not asking you to take it on — happy to do the work on a fork branch if you don't want it in trunk, but wanted to surface the design considerations first in case you've already thought about it or have opinions on the abstraction boundary.
Context
Discovered while finishing a v1→v2 migration on a K8s control-plane node.
Use case
Spawn per-session agent containers as Kubernetes pods on a user-provided cluster instead of local Docker / Apple Container. Today
src/container-runtime.tsexportsCONTAINER_RUNTIME_BIN = 'docker'andsrc/container-runner.tsshells out todocker run; on a host that's already a K8s control-plane node with everything else running in-cluster, the local-Docker path is the odd one out — different lifecycle, different observability, different resource controls.Concretely my host is
swarm0, a control-plane node in a 5-node Cilium cluster. All other personal workloads (convertx, pi-hole, netbird, portainer, etc.) live in-cluster behind nginx-ingress + cert-manager DNS-01. The nanoclaw host process runs as root (it's a dedicated node), and/pods/nanoclaw-v2/data/v2-sessions/is on NFS. I just finished a v1→v2 migration on this host and discovered a couple of papercuts that a K8s runtime would dissolve:USER node(uid 1000) cannot write outbound.db on NFS without an explicit chown step (filed separately as fix(session-manager): chown new session dirs when host runs as root #2353).docker0and containers reach it viahost.docker.internal— both Docker-specific concepts that don't apply in K8s.What I think a runtime abstraction would need to handle
This is sketch-level — happy to refine if you want to take it on:
kubernetesmode (env-toggled, e.g.CONTAINER_RUNTIME=k8s).CONTAINER_RUNTIME_BINbecomes an abstract spawn function rather than a binary name.args.push('-e', ...)/args.push('-v', ...)with PodSpec generation: env →env, volume mounts →volumes/volumeMounts,--user→securityContext.runAsUser,--add-host→hostAliases,--rm→ anOnFailurePod (or a Job).hostPathif the host process and pods land on the same node (DaemonSet-style affinity). Mount-allowlist entries in~/.config/nanoclaw/mount-allowlist.jsonwould translate to additional PVCs or hostPath volumes.http://host.docker.internal:CREDENTIAL_PROXY_PORT. In K8s either aServiceof type ClusterIP (proxy runs as a sidecar Deployment) or a host-network endpoint plushostNetwork: trueon agent pods./workspace/.heartbeatis currently a host bind-mount the host process polls viafs.statSync. With pods, either keep the same NFS-backed path (works if you have a shared filesystem) or move heartbeat into outbound.db / a CRD / a watch on Pod conditions.container/build.shbuilds locally and tagsnanoclaw-agent-v2-<slug>:latest. With K8s, that image needs to be pushed to a registry the cluster can pull from (most clusters can't pull from the local Docker daemon). I have an in-clusterregistrynamespace; users without one would need a setup-time prompt to point at Docker Hub / GHCR / etc.cleanupOrphans()scansdocker ps --filter label=.... K8s equivalent: list Pods with the install-slug label and delete completed ones (or rely onJob.spec.ttlSecondsAfterFinished).--rmwipes container logs on exit today, which already makes debugging hard (docs/...mentions this). K8s pods retain logs until garbage collection; using a non---rmmodel would actually be easier.What I'd like out of this issue
Just to flag the use case and architectural shape. Not asking you to take it on — happy to do the work on a fork branch if you don't want it in trunk, but wanted to surface the design considerations first in case you've already thought about it or have opinions on the abstraction boundary.
Context
🤖 Generated with Claude Code