fix(opencode): kill server process group + configurable IDLE_TIMEOUT_MS by glifocat · Pull Request #2152 · nanocoai/nanoclaw

glifocat · 2026-04-30T10:57:13Z

Type of Change

Feature skill - adds a channel or integration (source code changes + SKILL.md)
Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
Fix - bug fix or security fix to source code
Simplification - reduces or simplifies source code
Documentation - docs, README, or CONTRIBUTING changes only

Description

Closes #2148. Closes #2149.

Two related bugs in the OpenCode provider that fire together when a local backend (Ollama, llama.cpp) is slower than the hardcoded 90 s event timeout. Bundled into a single PR because they share container/agent-runner/src/providers/opencode.ts and a small helper.

#2148 — `proc.kill('SIGKILL')` leaks the underlying binary, holding port 4096

spawn('opencode', ...) runs the npm opencode-ai wrapper script that execs the platform binary opencode-linux-*/bin/opencode — which is the actual port listener on 127.0.0.1:4096. SIGKILL on the wrapper PID either races with the exec or the listener has already detached; the binary survives and the port stays bound. Next spawnOpencodeServer call fails with Failed to start server on port 4096 / EADDRINUSE.

Fix: spawn detached and signal the whole process group via a new killProcessTree(proc) helper that calls process.kill(-pid, 'SIGKILL') (with a fallback to plain proc.kill('SIGKILL') if the negative-PID call throws — covers the case where the spawn never made it into a process group).

Both call sites updated:

startup-timeout cleanup in spawnOpencodeServer
destroySharedRuntime

#2149 — Configurable idle timeout

IDLE_TIMEOUT_MS = 90_000 was hardcoded. Used as a between-events watchdog, but on a freshly-prompted session it acts as a TTFT ceiling — fine for cloud APIs (sub-second TTFT), too tight for local 30B+ inference on cold start.

Fix: read OPENCODE_IDLE_TIMEOUT_MS from env, default to 300_000 (5 min). Generous for cloud, just enough for slow local. Per-group override via container.json env, e.g. "OPENCODE_IDLE_TIMEOUT_MS": "600000" — no rebuild needed since src/ is bind-mounted.

Tests

No behavior-changing additions. Manually verified:

Process group kill: docker exec <container> pgrep -af opencode no longer shows orphan [opencode] <defunct> after a forced timeout; 127.0.0.1:4096 is free immediately.
Configurable timeout: env override applied; default 300 s confirmed when var unset.

Compounding behavior

Without #2148 fixed, every timeout from the 90 s ceiling (or any idle ceiling) leaks a process and renders the agent container unusable until restarted. Fixing one without the other is half a fix — that's why they're filed together.

For Skills

SKILL.md contains instructions, not inline code (code goes in separate files)
SKILL.md is under 500 lines
I tested this skill on a fresh clone

Not a skill PR — section N/A.

Two bugs in the upstream OpenCode provider that fire together when a local backend (Ollama, llama.cpp) is slower than the hardcoded 90s event timeout: 1. proc.kill('SIGKILL') only kills the wrapper process the spawn returned, not the opencode-linux-*/bin/opencode child it execs into. The child keeps holding port 4096, so the next spawnOpencodeServer() fails with "Failed to start server on port 4096" / EADDRINUSE. Fix: spawn detached and signal the whole process group via process.kill(-pid, 'SIGKILL') in a new killProcessTree() helper. 2. IDLE_TIMEOUT_MS = 90_000 is hardcoded. For a local 31B model the first prompt's time-to-first-token routinely exceeds that, tripping the timeout. Fix: read OPENCODE_IDLE_TIMEOUT_MS from env, default 300_000 (5 min) — generous for cloud APIs, just enough for local. Per-group override goes in container.json env (e.g. "600000" for a slow local box), no rebuild needed since src/ is bind-mounted. Same bugs exist on origin/providers — should be ported upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gavrielc · 2026-05-01T15:40:46Z

Thanks!

glifocat requested review from gabi-simons and gavrielc as code owners April 30, 2026 10:57

github-actions Bot added follows-guidelines PR was created using the current contributing template PR: Fix Bug fix labels Apr 30, 2026

gavrielc merged commit b429ab3 into nanocoai:providers May 1, 2026
1 check passed

bkutasi mentioned this pull request May 1, 2026

🦞 OpenClaw Ecosystem Digest 2026-05-02 bkutasi/big_model_radar#2

Open

This was referenced May 2, 2026

🦞 OpenClaw 生态日报 2026-05-02 gsscsd/big_model_radar#281

Open

🦞 OpenClaw 生态日报 2026-05-02 borq168/big_model_radar#106

Open

🦞 OpenClaw 生态日报 2026-05-02 zx0828/big_model_radar#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opencode): kill server process group + configurable IDLE_TIMEOUT_MS#2152

fix(opencode): kill server process group + configurable IDLE_TIMEOUT_MS#2152
gavrielc merged 1 commit into
nanocoai:providersfrom
glifocat:fix/opencode-process-group-and-timeout

glifocat commented Apr 30, 2026

Uh oh!

Uh oh!

gavrielc commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glifocat commented Apr 30, 2026

Type of Change

Description

#2148 — proc.kill('SIGKILL') leaks the underlying binary, holding port 4096

#2149 — Configurable idle timeout

Tests

Compounding behavior

For Skills

Uh oh!

Uh oh!

gavrielc commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#2148 — `proc.kill('SIGKILL')` leaks the underlying binary, holding port 4096