Commit 068c10e
fix(rlm): harden executor surface, agent search envelope, and OpenCode hook (#870)
* chore(workspace): unblock pre-commit hook for hardening branch
Three pre-existing failures on main blocking the pre-commit hook; this
branch needs them resolved before adding RLM hardening commits.
1. Five orchestrator integration tests construct OrchestratorConfig
literally and were not updated when GiteaSkillRepoConfig was added
in 2d41798. Add gitea_skill_repo: None to each fixture.
2. main.rs:4240 was added in 2d41798 (#1486) calling
api.get_thesaurus(&role_name) where role_name is &RoleName but the
method expects &str. Also moved role_name twice into SearchQuery.
Smallest fix: clone() at the SearchQuery moves and pass
role_name.as_str() to get_thesaurus. The whole block is replaced
wholesale in the upcoming concepts_matched helper extraction, so
this is a temporary unblocker.
3. terraphim_service llm::router_config tests share process env:
test_env_overrides sets LLM_PROXY_URL without serialising against
test_merged_config_from_role, so parallel execution caused the
latter to read the override and fail. Mark both #[serial] and add
a defensive remove_var at the start of test_merged_config_from_role.
cargo check --workspace --all-targets and cargo test -p
terraphim_service --lib llm::router_config are clean after these changes.
* fix(rlm-local): honour ctx.timeout_ms, kill_on_drop, NotSupported snapshots
LocalExecutor now respects the caller's ExecutionContext.timeout_ms
instead of a hardcoded 30s, and spawns children with kill_on_drop(true)
so timed-out processes are reaped instead of leaking.
Snapshot operations now return RlmError::NotSupported instead of
fabricating a SnapshotId for a no-op, giving callers an honest signal
that VM-style state versioning is unavailable on this backend.
Adds ExecutionEnvironment::end_session(&SessionId) trait method with a
default Ok(()) impl, so backends with per-session resources (Docker)
can override without disturbing existing impls.
Removes the unused output_dir field and the unnecessary SessionId clone.
Tests: 7 unit tests in local.rs (timeout-honoured, child-killed,
snapshots-not-supported, end-session-default), 1 in error.rs
(NotSupported error code, display, retryability).
Refs review of e4d896d..4442671
* fix(rlm-docker): per-session lock, lifecycle hook, resource limits
Closes the TOCTOU race in ensure_container by replacing the
parking_lot::RwLock<HashMap<...>> with a DashMap<SessionId,
Arc<tokio::sync::Mutex<Option<String>>>>. Concurrent execute_* calls
for the same session_id now serialise on the per-session Mutex; only
one container is created.
Adds DockerExecutor::release_session_container(&SessionId) inherent
method, mirroring FirecrackerExecutor::release_session_vm. Also wires
ExecutionEnvironment::end_session to call it, so the supervisor can
release containers via the trait when needed.
Adds default_host_config() applying the permissive profile per
2026-05-15 design decision: 512 MiB memory cap, 256 PIDs cap, all caps
dropped, network=bridge (LLM bridge / pip use), readonly_rootfs=false
(Python /tmp writes).
Replaces sleep 3600 keepalive with sleep infinity so the container
outlives any reasonable session and is torn down only by
release_session_container, cleanup, or Drop.
Replaces snapshot method bodies with RlmError::NotSupported. Replaces
`as i32` cast on bollard's i64 exit code with i32::try_from. Removes
the #[allow(dead_code)] attribute on the struct (project rule: no dead
code) and the now-redundant config field.
Tests: 8 in docker.rs (6 daemon-free, 2 gated on --ignored): host
config profile, snapshots-not-supported, release-unknown-returns-none,
release-removes-and-recreates, concurrent-ensure-no-leak.
Refs review of e4d896d..4442671
* fix(rlm-selector): degrade gracefully, drop E2B fallthrough lie
Three fixes to select_executor:
1. The E2B arm previously logged "Selected E2B backend" then fell
through without continuing, misleading operators reading logs.
Now logs at debug! and explicitly continues to the next backend.
2. The Docker arm previously short-circuited the entire fallback chain
if DockerExecutor::new returned Err (e.g. CLI present, daemon
unreachable). Now logs at warn!, pushes to `tried`, and continues
to the next backend so Local remains selectable.
3. The Local arm previously logged "Selected Local backend (no
isolation)" at info! - falling back to no-isolation is a
security-posture downgrade and should be visible in production logs.
Now logs at warn! including the `tried` context so operators can see
why the secure backends were unavailable.
Also caches `is_docker_available()` once per call (avoiding repeated
~50-100ms shell-outs to `docker --version`) and tightens the helper
log to "CLI not present" since it does not actually probe the daemon.
Tests: 2 new unit tests in mod.rs (Local-preference returns Local,
E2B-unimplemented falls through to Local).
Refs review of e4d896d..4442671
* fix(agent): replace duplicated concepts_matched logic with helper
Adds two public helpers to terraphim_automata::matcher:
- compute_concepts_matched(query, &Thesaurus) -> Vec<String>:
the small ergonomic wrapper around find_matches that the agent's
robot-search envelope needs in two places.
- thesaurus_from_terms(&RoleName, impl IntoIterator<Item=&str>) ->
Thesaurus: builds a minimal Thesaurus from a slice of plain term
strings, assigning each a unique id via NormalizedTerm::with_auto_id.
Replaces inline blocks at main.rs:2177-2182 (offline) and 4247-4268
(server) with the new helpers. The server-mode call site previously
assigned 1u64 to every reconstructed NormalizedTerm; the new
with_auto_id reconstruction is semantically honest. Both sites log
thesaurus failures at debug! instead of silently swallowing them.
The /thesaurus/{role} HTTP API stays untouched.
Tests: 5 in automata::compute_concepts_matched_tests including a
parity test guarding offline/server thesaurus shapes producing the
same concepts_matched output.
Refs review of e4d896d..4442671
* fix(rlm-hook): portable timeout, safe JSON, smoke tests
terraphim-rlm-hook.sh:
- Replace string-interpolated JSON with jq -n --arg/--argjson so prompts
containing double quotes, backslashes, or other shell-meta chars no
longer corrupt the JSON-RPC request body.
- Replace `timeout 30` (GNU coreutils, absent on macOS by default) with
a portable POSIX subshell timeout wrapper. The hook now works on macOS
without `brew install coreutils`.
- Drop the `2>/dev/null` redirect on the MCP invocation so operators
retain stderr for debugging failed calls.
install.sh now warns (does not block) if `jq` is missing.
README.md documents the `jq` dependency, macOS portability, the
known-limitation of the JS plugin's per-call MCP-server spawn, and the
new tests/test_hook.sh smoke-test entry point.
tests/test_hook.sh exercises the hook with a quoted prompt (asserts
the captured payload is valid JSON), a passthrough non-RLM command, and
the portable timeout wrapper on a stripped PATH that excludes GNU
timeout / gtimeout. Tests stub $TERRAPHIM_MCP so no live MCP server is
needed.
Refs review of e4d896d..4442671
* fix(orchestrator-config): redact GiteaSkillRepoConfig.token, set cache_dir default
Implements Debug manually so the optional `token` field appears as
"***REDACTED***" instead of leaking via panic backtraces or trace
dumps. The struct no longer derives Debug.
Provides a sensible default for `cache_dir` instead of an empty
PathBuf: prefer $XDG_CACHE_HOME, then $HOME/.cache, then $TMPDIR,
always under terraphim/skills/.
Tests: 2 unit tests in terraphim_orchestrator::config::tests covering
the redacted Debug output and the cache_dir default.
Refs review of e4d896d..4442671
* fix(rlm-hardening): self-review follow-ups (P1+P2)
Addresses every finding from the structural self-review of the
hardening branch.
P1 fixes:
1. Wire end_session trait method through TerraphimRlm::destroy_session
(rlm.rs:194). FirecrackerExecutor overrides end_session to call the
inherent release_session_vm, so destroying a session now releases
per-backend resources via a single trait call. Adds a regression
test (test_destroy_session_calls_executor_end_session) that uses an
instrumented MockExecutor with an AtomicUsize counter.
2. Fix run_with_timeout PID-reuse race in terraphim-rlm-hook.sh by
running the child in its own process group (setsid where available)
and targeting kill at the negative pgid instead of a recyclable pid.
Falls back to plain pid kill on systems without setsid.
P2 fixes:
3. Drop #[ignore] from the two real-Docker integration tests
(test_docker_concurrent_ensure_no_leak and
test_docker_release_session_container_removes). Add a
skip_unless_image_ready helper that gates on both daemon presence
AND the default image being cached locally; tests skip cleanly with
a hint to `docker pull` if the image is missing.
4. Hook smoke test now also asserts prompt content fidelity
(prompt_content_roundtrip) by extracting params.arguments.prompt
from the captured payload and comparing it byte-for-byte to the
original input. Replaces the awk-based RLM args extractor with
bash parameter expansion that preserves whitespace and literal
quotes.
5. Force run_with_timeout test 3 to actually exercise the portable
wrapper on Linux by staging a private $PATH (only sleep, kill,
bash, sh, dash, setsid, grep, cut, head; no timeout / gtimeout) and
sourcing the function definitions directly from the hook script.
Test no longer self-skips on systems shipping GNU coreutils.
5a. Add DockerExecutor::with_host_config(HostConfig) builder so callers
can override the permissive default profile (network=bridge,
readonly_rootfs=false). Test confirms override propagates.
5c. Replace the 200ms fixed sleep in test_local_kills_on_timeout with a
bounded pgrep poll (50ms backoff up to 2s). Marker now includes a
Ulid so concurrent test runs don't collide.
5d clippy: convert five `let mut x = T::default(); x.field = ...`
patterns to struct-update syntax, silencing the
field_reassign_with_default warnings.
Test count: 128 in terraphim_rlm lib (was 126), 4 in hook smoke tests
(was 3). Zero clippy warnings on touched crates. cargo fmt clean.
* docs(rlm-hardening): add Phase 1 research and Phase 2 design
Audit trail for the disciplined-development workflow that produced this
branch. Captures:
- Vital-few constraints (no new deps, no trait churn, no API change).
- Eliminated alternatives (capability-matcher rewrite, /thesaurus/ API
enrichment, container pooling) with rationale per item.
- Five risks with assigned mitigations and three open questions
resolved during implementation.
- File-level change list, function signatures, test strategy, and
step sequencing matching the seven commits on this branch.
* fix(orchestrator): prevent cron re-triggering within same schedule window
Add last_cron_fire HashMap to track per-agent last fire time, preventing
agents from firing multiple times within the same cron schedule window.
Fixes: agents firing 130+ times instead of ~11 times per schedule cycle
Test: 11 synthetic time tests covering AC1-AC5, compound schedules,
and boundary conditions
Refs #IDX
* docs(rlm-hardening): add session handover and quality eval docs
Refs #1488
* docs(handover): Echo verification session for #1488
Refs #1488
* docs(handover): Echo re-verification session for #1488
Refs #1488
---------
Co-authored-by: Alex <alex@example.com>
Co-authored-by: forge-admin <admin@terraphim-forge.local>1 parent 9832fc9 commit 068c10e
32 files changed
Lines changed: 3098 additions & 141 deletions
File tree
- .docs
- crates
- terraphim_agent/src
- terraphim_automata/src
- terraphim_orchestrator
- src
- tests
- terraphim_rlm/src
- executor
- terraphim_service/src/llm
- docs/handovers
- examples/opencode-plugin-rlm
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
0 commit comments