fix(session-manager): chown new session dirs when host runs as root by netadmincmh-hash · Pull Request #2353 · nanocoai/nanoclaw

netadmincmh-hash · 2026-05-08T20:37:32Z

Summary

Linux installs that run NanoClaw as root with the data directory on a network filesystem hit an unrecoverable container spawn loop. Two constraints collide:

The agent image ships with USER node (uid 1000) and Claude Code refuses to run as root:
```
--dangerously-skip-permissions cannot be used with root/sudo privileges
```
So --user 0:0 on docker run is not a workaround.
The host process writes inbound.db and the session-folder scaffolding as uid 0. On a network filesystem (NFS in my case) the container's uid 1000 cannot write outbound.db or touch the heartbeat file. bun:sqlite surfaces this as:
```
Fatal error: attempt to write a readonly database
```

The container exits code=1 microseconds after agent-runner startup, the host sweep retries a few times, then marks the inbound message completed without ever sending a response.

What this changes

initSessionFolder chowns each freshly-created session directory to 1000:1000 when process.getuid() === 0. No-op for any non-root host UID — those paths fall through to the existing --user $hostUid:$hostGid mapping in container-runner.ts. execFileSync('chown', ...) is best-effort: if it fails, the agent fails later with the clearer SQLite error and the sweep retries until the operator notices.

Test plan

pnpm run build passes (no new deps, no type changes).
On the affected host (root + NFS-backed /pods), reproduced the spawn loop on main. With this patch applied, chown -R 1000:1000 <session-dir> runs at session create, the container's node user can write outbound.db, and Telegram round-trip completes (verified end-to-end).
Doesn't run when host UID != 0, so no behavior change for the normal Mac/Linux-as-non-root install.

Reproducer

# On a host running as root, with /pods on NFS:
sudo -i
cd /pods/nanoclaw-v2
systemctl start nanoclaw-v2-<slug>
# Send any inbound to the bot. Container exits ~370ms after spawn.
journalctl -u nanoclaw-v2-<slug> | grep "readonly database"

Notes / things to discuss

The hard-coded 1000:1000 matches the image's USER node directive but isn't future-proof if the image ever changes UID. If you'd prefer, this could be made configurable via env var (e.g. NANOCLAW_CONTAINER_UID:GID) or read from the image at startup. Happy to fold in either approach.
Existing session dirs created before the patch is applied won't be chowned automatically — operators on affected setups will need a one-time chown -R 1000:1000 data/v2-sessions/.
This was discovered finishing a v1→v2 migration; full incident notes in the migration record (separate from this PR).

🤖 Generated with Claude Code

Two constraints collide on a Linux install where the NanoClaw host runs as root and the data directory is on a network filesystem (NFS, etc.): 1. The agent image ships with USER node (uid 1000) and Claude Code refuses to run as root with the error: --dangerously-skip-permissions cannot be used with root/sudo privileges so we cannot pass --user 0:0 to docker run as a workaround. 2. The host writes inbound.db and the session-folder scaffolding as uid 0. On a network filesystem the container's uid 1000 cannot write outbound.db or touch the heartbeat file, and bun:sqlite surfaces this as Fatal error: attempt to write a readonly database The result is an unrecoverable spawn loop: every container exits with code 1 microseconds after agent-runner startup, and the host sweep marks the inbound message completed after a few retries. This patch chowns each freshly-created session directory to 1000:1000 when process.getuid() === 0. No-op when the host already runs as the container UID (1000) or any other non-root UID — those paths fall through to the existing --user $hostUid:$hostGid mapping in container-runner.ts. chown is best-effort: if it fails, the agent will fail later with the clearer SQLite error and the sweep retries until the operator notices. Reproducer: run nanoclaw-v2 as root with /pods on NFS, send any inbound message; container exits code=1 with 'attempt to write a readonly database'.

netadmincmh-hash requested review from gabi-simons and gavrielc as code owners May 8, 2026 20:37

netadmincmh-hash mentioned this pull request May 8, 2026

feat: Kubernetes container runtime for agent spawning #2354

Open

This was referenced May 9, 2026

🦞 OpenClaw 生态日报 2026-05-09 zx0828/big_model_radar#48

Open

🦞 OpenClaw 生态日报 2026-05-09 ivanweng2077/big_model_radar#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session-manager): chown new session dirs when host runs as root#2353

fix(session-manager): chown new session dirs when host runs as root#2353
netadmincmh-hash wants to merge 1 commit into
nanocoai:mainfrom
netadmincmh-hash:fix/chown-session-dirs-when-host-is-root

netadmincmh-hash commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

netadmincmh-hash commented May 8, 2026

Summary

What this changes

Test plan

Reproducer

Notes / things to discuss

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant