Skip to content

fix(rlm): harden executor surface, agent search envelope, and OpenCode hook#870

Merged
AlexMikhalev merged 15 commits into
mainfrom
task/rlm-executor-hardening
May 16, 2026
Merged

fix(rlm): harden executor surface, agent search envelope, and OpenCode hook#870
AlexMikhalev merged 15 commits into
mainfrom
task/rlm-executor-hardening

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Summary

Addresses every finding from the structural review of commits e4d896d3d..4442671e9 (LocalExecutor + DockerExecutor + OpenCode plugin + agent concepts_matched fix). 9 commits, no new dependencies, no public HTTP API change.

  • fix(rlm-local) — honour ctx.timeout_ms, kill_on_drop(true), snapshots return RlmError::NotSupported. Adds end_session(&SessionId) to the ExecutionEnvironment trait with default Ok(()) impl.
  • fix(rlm-docker)DashMap<SessionId, Arc<tokio::sync::Mutex<Option<String>>>> closes the TOCTOU race in ensure_container. Adds release_session_container inherent method. host_config with permissive limits (memory 512 MiB, pids 256, drop ALL caps, network=bridge, readonly_rootfs=false). sleep infinity keepalive. Removes #[allow(dead_code)].
  • fix(rlm-selector) — E2B arm logs debug! and continues (no more "Selected E2B" lie); Docker init failure degrades to next backend; Local fallback logs warn! with tried context; is_docker_available() cached once per call.
  • fix(agent) — Extracts compute_concepts_matched and thesaurus_from_terms to public terraphim_automata::matcher; both call sites use them; server path uses with_auto_id (no more id=1u64 collision). Parity test guards against future drift.
  • fix(rlm-hook)jq -n --arg/--argjson for safe JSON; portable POSIX run_with_timeout (process-group kill via setsid); stderr propagated. 4 smoke tests.
  • fix(orchestrator-config) — Manual Debug redacts GiteaSkillRepoConfig.token; cache_dir defaults to $XDG_CACHE_HOME / $HOME/.cache / $TMPDIR (under terraphim/skills).
  • fix(rlm-hardening): self-review follow-ups — Wires end_session through TerraphimRlm::destroy_session; FirecrackerExecutor overrides to call release_session_vm. Process-group kill in hook watchdog. Removes #[ignore] from Docker integration tests; gates them on image presence (skip_unless_image_ready). Hook smoke test now also asserts prompt content fidelity (prompt_content_roundtrip); awk extractor replaced by bash parameter expansion. Adds DockerExecutor::with_host_config(HostConfig) builder. Replaces 200ms sleep heuristic in test_local_kills_on_timeout with bounded pgrep poll. Forces run_with_timeout test to actually exercise wrapper on Linux via private-PATH staging. Five clippy field_reassign_with_default warnings cleared via struct-update syntax.
  • docs(rlm-hardening) — Phase 1 research and Phase 2 design under .docs/ as the disciplined-development audit trail.
  • chore(workspace): unblock pre-commit hook — Pre-existing main-branch fixups: gitea_skill_repo: None added to 5 orchestrator integration test fixtures; .as_str()+clone() workaround for the broken agent code that Step 4 then replaces; #[serial] + defensive env::remove_var on the env-polluting llm::router_config tests.

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy -p terraphim_rlm -p terraphim_automata -p terraphim_agent -p terraphim_orchestrator --all-targets produces 0 warnings on touched code
  • cargo test -p terraphim_rlm --features full --lib — 128 / 128 (was 124; integration tests now run when image is cached, otherwise skip cleanly)
  • cargo test -p terraphim_automata --lib — 76 / 76 (5 new compute_concepts_matched_tests incl. parity)
  • cargo test -p terraphim_orchestrator --lib config:: — 68 / 68 (2 new gitea_skill_repo tests)
  • cargo test --workspace --lib — every package green
  • bash examples/opencode-plugin-rlm/tests/test_hook.sh — 4 / 4
  • Reviewer: optional docker pull python:3.11-slim then re-run RLM tests to exercise the live integration tests (test_docker_concurrent_ensure_no_leak, test_docker_release_session_container_removes)
  • Reviewer: spot-check the disciplined-development docs under .docs/research-rlm-executor-hardening.md and .docs/implementation-plan-rlm-executor-hardening.md

Diff size

24 files changed, 2137 insertions(+), 130 deletions(-) (~940 lines are the audit-trail docs)

Notes

  • Source of work is the in-conversation structural review; no Gitea/GitHub issue. Each commit message contains Refs review of e4d896d3d..4442671e9 for traceability.
  • All findings from a follow-up self-review (P1: dead trait API, P1: PID-reuse race; P2: ignored tests, hook test fidelity, host_config builder, sleep heuristic, Linux test self-skip) addressed in commit 9cf6ec2d3 before pushing.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Alex and others added 12 commits May 15, 2026 12:55
Three pre-existing failures on main blocking the pre-commit hook; this
branch needs them resolved before adding RLM hardening commits.

1. Five orchestrator integration tests construct OrchestratorConfig
   literally and were not updated when GiteaSkillRepoConfig was added
   in 2d41798. Add gitea_skill_repo: None to each fixture.

2. main.rs:4240 was added in 2d41798 (#1486) calling
   api.get_thesaurus(&role_name) where role_name is &RoleName but the
   method expects &str. Also moved role_name twice into SearchQuery.
   Smallest fix: clone() at the SearchQuery moves and pass
   role_name.as_str() to get_thesaurus. The whole block is replaced
   wholesale in the upcoming concepts_matched helper extraction, so
   this is a temporary unblocker.

3. terraphim_service llm::router_config tests share process env:
   test_env_overrides sets LLM_PROXY_URL without serialising against
   test_merged_config_from_role, so parallel execution caused the
   latter to read the override and fail. Mark both #[serial] and add
   a defensive remove_var at the start of test_merged_config_from_role.

cargo check --workspace --all-targets and cargo test -p
terraphim_service --lib llm::router_config are clean after these changes.
…pshots

LocalExecutor now respects the caller's ExecutionContext.timeout_ms
instead of a hardcoded 30s, and spawns children with kill_on_drop(true)
so timed-out processes are reaped instead of leaking.

Snapshot operations now return RlmError::NotSupported instead of
fabricating a SnapshotId for a no-op, giving callers an honest signal
that VM-style state versioning is unavailable on this backend.

Adds ExecutionEnvironment::end_session(&SessionId) trait method with a
default Ok(()) impl, so backends with per-session resources (Docker)
can override without disturbing existing impls.

Removes the unused output_dir field and the unnecessary SessionId clone.

Tests: 7 unit tests in local.rs (timeout-honoured, child-killed,
snapshots-not-supported, end-session-default), 1 in error.rs
(NotSupported error code, display, retryability).

Refs review of e4d896d..4442671
Closes the TOCTOU race in ensure_container by replacing the
parking_lot::RwLock<HashMap<...>> with a DashMap<SessionId,
Arc<tokio::sync::Mutex<Option<String>>>>. Concurrent execute_* calls
for the same session_id now serialise on the per-session Mutex; only
one container is created.

Adds DockerExecutor::release_session_container(&SessionId) inherent
method, mirroring FirecrackerExecutor::release_session_vm. Also wires
ExecutionEnvironment::end_session to call it, so the supervisor can
release containers via the trait when needed.

Adds default_host_config() applying the permissive profile per
2026-05-15 design decision: 512 MiB memory cap, 256 PIDs cap, all caps
dropped, network=bridge (LLM bridge / pip use), readonly_rootfs=false
(Python /tmp writes).

Replaces sleep 3600 keepalive with sleep infinity so the container
outlives any reasonable session and is torn down only by
release_session_container, cleanup, or Drop.

Replaces snapshot method bodies with RlmError::NotSupported. Replaces
`as i32` cast on bollard's i64 exit code with i32::try_from. Removes
the #[allow(dead_code)] attribute on the struct (project rule: no dead
code) and the now-redundant config field.

Tests: 8 in docker.rs (6 daemon-free, 2 gated on --ignored): host
config profile, snapshots-not-supported, release-unknown-returns-none,
release-removes-and-recreates, concurrent-ensure-no-leak.

Refs review of e4d896d..4442671
Three fixes to select_executor:

1. The E2B arm previously logged "Selected E2B backend" then fell
   through without continuing, misleading operators reading logs.
   Now logs at debug! and explicitly continues to the next backend.

2. The Docker arm previously short-circuited the entire fallback chain
   if DockerExecutor::new returned Err (e.g. CLI present, daemon
   unreachable). Now logs at warn!, pushes to `tried`, and continues
   to the next backend so Local remains selectable.

3. The Local arm previously logged "Selected Local backend (no
   isolation)" at info! - falling back to no-isolation is a
   security-posture downgrade and should be visible in production logs.
   Now logs at warn! including the `tried` context so operators can see
   why the secure backends were unavailable.

Also caches `is_docker_available()` once per call (avoiding repeated
~50-100ms shell-outs to `docker --version`) and tightens the helper
log to "CLI not present" since it does not actually probe the daemon.

Tests: 2 new unit tests in mod.rs (Local-preference returns Local,
E2B-unimplemented falls through to Local).

Refs review of e4d896d..4442671
Adds two public helpers to terraphim_automata::matcher:

- compute_concepts_matched(query, &Thesaurus) -> Vec<String>:
  the small ergonomic wrapper around find_matches that the agent's
  robot-search envelope needs in two places.

- thesaurus_from_terms(&RoleName, impl IntoIterator<Item=&str>) ->
  Thesaurus: builds a minimal Thesaurus from a slice of plain term
  strings, assigning each a unique id via NormalizedTerm::with_auto_id.

Replaces inline blocks at main.rs:2177-2182 (offline) and 4247-4268
(server) with the new helpers. The server-mode call site previously
assigned 1u64 to every reconstructed NormalizedTerm; the new
with_auto_id reconstruction is semantically honest. Both sites log
thesaurus failures at debug! instead of silently swallowing them.

The /thesaurus/{role} HTTP API stays untouched.

Tests: 5 in automata::compute_concepts_matched_tests including a
parity test guarding offline/server thesaurus shapes producing the
same concepts_matched output.

Refs review of e4d896d..4442671
terraphim-rlm-hook.sh:

- Replace string-interpolated JSON with jq -n --arg/--argjson so prompts
  containing double quotes, backslashes, or other shell-meta chars no
  longer corrupt the JSON-RPC request body.
- Replace `timeout 30` (GNU coreutils, absent on macOS by default) with
  a portable POSIX subshell timeout wrapper. The hook now works on macOS
  without `brew install coreutils`.
- Drop the `2>/dev/null` redirect on the MCP invocation so operators
  retain stderr for debugging failed calls.

install.sh now warns (does not block) if `jq` is missing.

README.md documents the `jq` dependency, macOS portability, the
known-limitation of the JS plugin's per-call MCP-server spawn, and the
new tests/test_hook.sh smoke-test entry point.

tests/test_hook.sh exercises the hook with a quoted prompt (asserts
the captured payload is valid JSON), a passthrough non-RLM command, and
the portable timeout wrapper on a stripped PATH that excludes GNU
timeout / gtimeout. Tests stub $TERRAPHIM_MCP so no live MCP server is
needed.

Refs review of e4d896d..4442671
…e_dir default

Implements Debug manually so the optional `token` field appears as
"***REDACTED***" instead of leaking via panic backtraces or trace
dumps. The struct no longer derives Debug.

Provides a sensible default for `cache_dir` instead of an empty
PathBuf: prefer $XDG_CACHE_HOME, then $HOME/.cache, then $TMPDIR,
always under terraphim/skills/.

Tests: 2 unit tests in terraphim_orchestrator::config::tests covering
the redacted Debug output and the cache_dir default.

Refs review of e4d896d..4442671
Addresses every finding from the structural self-review of the
hardening branch.

P1 fixes:

1. Wire end_session trait method through TerraphimRlm::destroy_session
   (rlm.rs:194). FirecrackerExecutor overrides end_session to call the
   inherent release_session_vm, so destroying a session now releases
   per-backend resources via a single trait call. Adds a regression
   test (test_destroy_session_calls_executor_end_session) that uses an
   instrumented MockExecutor with an AtomicUsize counter.

2. Fix run_with_timeout PID-reuse race in terraphim-rlm-hook.sh by
   running the child in its own process group (setsid where available)
   and targeting kill at the negative pgid instead of a recyclable pid.
   Falls back to plain pid kill on systems without setsid.

P2 fixes:

3. Drop #[ignore] from the two real-Docker integration tests
   (test_docker_concurrent_ensure_no_leak and
   test_docker_release_session_container_removes). Add a
   skip_unless_image_ready helper that gates on both daemon presence
   AND the default image being cached locally; tests skip cleanly with
   a hint to `docker pull` if the image is missing.

4. Hook smoke test now also asserts prompt content fidelity
   (prompt_content_roundtrip) by extracting params.arguments.prompt
   from the captured payload and comparing it byte-for-byte to the
   original input. Replaces the awk-based RLM args extractor with
   bash parameter expansion that preserves whitespace and literal
   quotes.

5. Force run_with_timeout test 3 to actually exercise the portable
   wrapper on Linux by staging a private $PATH (only sleep, kill,
   bash, sh, dash, setsid, grep, cut, head; no timeout / gtimeout) and
   sourcing the function definitions directly from the hook script.
   Test no longer self-skips on systems shipping GNU coreutils.

5a. Add DockerExecutor::with_host_config(HostConfig) builder so callers
    can override the permissive default profile (network=bridge,
    readonly_rootfs=false). Test confirms override propagates.

5c. Replace the 200ms fixed sleep in test_local_kills_on_timeout with a
    bounded pgrep poll (50ms backoff up to 2s). Marker now includes a
    Ulid so concurrent test runs don't collide.

5d clippy: convert five `let mut x = T::default(); x.field = ...`
   patterns to struct-update syntax, silencing the
   field_reassign_with_default warnings.

Test count: 128 in terraphim_rlm lib (was 126), 4 in hook smoke tests
(was 3). Zero clippy warnings on touched crates. cargo fmt clean.
Audit trail for the disciplined-development workflow that produced this
branch. Captures:

- Vital-few constraints (no new deps, no trait churn, no API change).
- Eliminated alternatives (capability-matcher rewrite, /thesaurus/ API
  enrichment, container pooling) with rationale per item.
- Five risks with assigned mitigations and three open questions
  resolved during implementation.
- File-level change list, function signatures, test strategy, and
  step sequencing matching the seven commits on this branch.
…ndow

Add last_cron_fire HashMap to track per-agent last fire time, preventing
agents from firing multiple times within the same cron schedule window.

Fixes: agents firing 130+ times instead of ~11 times per schedule cycle
Test: 11 synthetic time tests covering AC1-AC5, compound schedules,
      and boundary conditions

Refs #IDX
@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview

Your documentation changes have been deployed to:
https://57f9015c.terraphim-docs.pages.dev

This preview will be available until the PR is closed.

@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview

Your documentation changes have been deployed to:
https://83a22673.terraphim-docs.pages.dev

This preview will be available until the PR is closed.

@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview

Your documentation changes have been deployed to:
https://cece35ac.terraphim-docs.pages.dev

This preview will be available until the PR is closed.

@AlexMikhalev AlexMikhalev merged commit 068c10e into main May 16, 2026
19 of 23 checks passed
@AlexMikhalev AlexMikhalev deleted the task/rlm-executor-hardening branch May 16, 2026 09:29
@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview

Your documentation changes have been deployed to:
https://740c7672.terraphim-docs.pages.dev

This preview will be available until the PR is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant