Skip to content

opencode_rlm_env#1023

Merged
snimu merged 23 commits intomainfrom
sebastian/ocrlm-2026-03-13
Mar 19, 2026
Merged

opencode_rlm_env#1023
snimu merged 23 commits intomainfrom
sebastian/ocrlm-2026-03-13

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented Mar 16, 2026

Description

opencode_rlm_env

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Medium Risk
Adds a new sandboxed agent environment that handles concurrent sub-LLM requests and changes interception/config plumbing; concurrency and request-routing changes could affect rollout stability and token accounting.

Overview
Introduces OpenCodeRLMEnv, extending OpenCodeEnv to support Recursive Language Model sub-agent calls via the snimu/oc plugin, including sandbox bootstrapping (bun + plugin install), header-based sub-LLM detection, and concurrent handling of sub-requests with semaphore-limited parallelism and optional trajectory logging.

Refactors CliAgentEnv request polling into _poll_next_request() to support the new concurrent routing, adjusts first-turn prompt capture to ignore sub-LLM steps, and updates OpenCodeEnv’s generated OpenCode config to use a fixed intercepted/model provider mapping.

Enhances the interception layer to record incoming HTTP headers per request (used for sub-LLM role detection), adds a monitor rubric for main vs sub-LLM token/turn metrics, adds comprehensive tests for config rendering and metrics, and updates environment docs to list OpenCodeEnv/OpenCodeRLMEnv.

Written by Cursor Bugbot for commit 8bf7d0a. This will update automatically on new commits. Configure here.

snimu and others added 3 commits March 15, 2026 16:13
OpenCodeRLMEnv extends OpenCodeEnv with the snimu/oc RLM plugin for
recursive sub-LLM calls. Sets env vars so the plugin routes llm-subcall
and subagent calls through the interception proxy with model="sub",
enabling concurrent handling and separate token tracking.

Includes opencode-rlm-test environment with 3 tasks exercising basic
bash, llm-subcall, and subagent capabilities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers constructor defaults, config generation (including shell
expansion), run command content, env var setup, sub-LLM detection,
state initialization, metrics tracking, and monitor rubric.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SWE-Bench Docker images use sh (dash) as default shell, which doesn't
support the bash-only `pipefail` option.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Fire-and-forget task may be garbage collected
    • Added _sub_llm_tasks set to track task references and prevent garbage collection, following the pattern used in parent class.
  • ✅ Fixed: New environment not listed in environments README
    • Added opencode_rlm_test to environments/README.md under experimental environments section and pattern reference section.
  • ✅ Fixed: Swallowing CancelledError prevents proper task cancellation
    • Added raise statement after exception handling in _handle_sub_llm_request to properly propagate CancelledError and other exceptions.

View PR

Or push these changes by commenting:

@cursor push 362a510ff0
Preview (362a510ff0)
diff --git a/environments/README.md b/environments/README.md
--- a/environments/README.md
+++ b/environments/README.md
@@ -45,6 +45,9 @@
 - **RLMEnv (Recursive Language Model)**
   - **rlm_secrets**: Puzzle environment testing RLM functionality including root-level tools, sub-LLM tool use, and file operations.
 
+- **OpenCodeRLMEnv (OpenCode with RLM plugin)**
+  - **opencode_rlm_test**: Smoke-test environment for `OpenCodeRLMEnv` demonstrating concurrent sub-LLM handling with the RLM plugin.
+
 - **HarborEnv / CliAgentEnv (CLI agent sandboxes)**
   - **opencode_harbor**: Runs the OpenCode CLI agent on Harbor tasks with API interception via Prime Tunnel.
   - **terminus_harbor**: Runs the Terminus agent on Harbor tasks with API interception via Prime Tunnel.
@@ -75,6 +78,7 @@
 - **CLI agent sandboxes**: `opencode_harbor`, `terminus_harbor`
 - **MCP integration**: `mcp_search_env`
 - **RLM (recursive LLM)**: `rlm_secrets`
+- **OpenCode RLM integration**: `opencode_rlm_test`
 - **Environment and rubric composition**: `math_group`, `math_python`, `wiki_search`
 - **Procedural datasets**: `reasoning_gym_env`
 - **Multimodal**: `mmmu`

diff --git a/verifiers/envs/experimental/opencode_rlm_env.py b/verifiers/envs/experimental/opencode_rlm_env.py
--- a/verifiers/envs/experimental/opencode_rlm_env.py
+++ b/verifiers/envs/experimental/opencode_rlm_env.py
@@ -167,6 +167,7 @@
         self.sub_timeout_ms = sub_timeout_ms
         self.include_sub_llm_in_trajectory = include_sub_llm_in_trajectory
         self._sub_llm_semaphore = asyncio.Semaphore(max_sub_llm_parallelism)
+        self._sub_llm_tasks: set[asyncio.Task[None]] = set()
 
         kwargs.setdefault("run_command_template", RLM_RUN_COMMAND_TEMPLATE)
 
@@ -304,9 +305,11 @@
 
             if self._is_sub_llm_request(intercept):
                 # Fire-and-forget: handled concurrently outside the loop
-                asyncio.create_task(
+                task = asyncio.create_task(
                     self._handle_sub_llm_request(state, request_id, intercept)
                 )
+                self._sub_llm_tasks.add(task)
+                task.add_done_callback(self._sub_llm_tasks.discard)
                 continue
 
             # Main-agent request → return to rollout loop
@@ -349,6 +352,7 @@
             except BaseException as e:
                 error = e
                 logger.warning("Sub-LLM request %s failed: %s", request_id, e)
+                raise
             finally:
                 if intercept.get("stream"):
                     await synthesize_stream(intercept, response, error)

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread verifiers/envs/experimental/opencode_rlm_env.py
Comment thread environments/opencode_rlm_test/opencode_rlm_test.py Outdated
Comment thread verifiers/envs/experimental/opencode_rlm_env.py Outdated
snimu and others added 3 commits March 15, 2026 20:49
Replaced by opencode-rlm-swe in research-environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store asyncio.create_task references in state["_sub_llm_tasks"] set
and use done callbacks to clean up. Prevents Python from silently
dropping in-flight sub-LLM requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows CancelledError and KeyboardInterrupt to propagate for proper
task cancellation during shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
snimu and others added 2 commits March 15, 2026 21:03
…dler

Catch BaseException to always resolve the HTTP future (preventing
hangs), but re-raise non-Exception types (CancelledError, etc.) after
delivery so task cancellation still propagates correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drain all pending sub-LLM tasks when the agent completes or times out,
ensuring metrics and trajectory updates are finalized before scoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
snimu and others added 4 commits March 15, 2026 21:26
Use `prompt` instead of `prompt_messages`, and include all required
fields (completion, tokens, reward, advantage, is_truncated,
trajectory_id) to match the TrajectoryStep TypedDict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace model-name substring matching with an explicit X-RLM-Role: sub
HTTP header set by the OC plugin. The interception server now captures
all request headers in the intercept dict for general-purpose use.

Removes: RLM_SUB_MODEL_ID env var, sub_model_identifier param,
RLM_LLM_SUBCALL_VIA_PROXY env var (llm-subcall now routes through
OPENAI_BASE_URL automatically when set).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace model-name substring matching with an explicit X-RLM-Role: sub
HTTP header set by the OC plugin. The interception server now captures
all request headers (lowercased) for general-purpose use.

Removes:
- RLM_SUB_MODEL_ID env var and sub_model_identifier param
- RLM_LLM_SUBCALL_VIA_PROXY env var (llm-subcall now routes through
  OPENAI_BASE_URL automatically when set)
- Model-name substring matching

Headers are stored with lowercase keys to handle HTTP/2 case
normalization correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, lgtm!

self.include_sub_llm_in_trajectory = include_sub_llm_in_trajectory
self._sub_llm_semaphore = asyncio.Semaphore(max_sub_llm_parallelism)

kwargs.setdefault("run_command_template", RLM_RUN_COMMAND_TEMPLATE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i actually like this as a pattern to distinguish env args from this env or parent envs

def _is_sub_llm_request(intercept: dict[str, Any]) -> bool:
return intercept.get("headers", {}).get("x-rlm-role") == "sub"

# ------------------------------------------------------------------
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont like all these section dividers

# Request routing
# ------------------------------------------------------------------

async def get_prompt_messages(self, state: State) -> Messages:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put the first part of this method (which i think is shared with cli agent env) in a shared util so we dont dup code here?

)
# Only count non-empty turns (skip the synthetic agent-completed step)
if prompt:
self._update_main_metrics(state, response)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason to compute metrics at rollout runtime as opposed to in the rubric reward fn?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied it from the RLMEnv where some time ago I think I was planning to live-update metrics bc the rollouts were so long, but then didn't; I can undo it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait there is a reason: the sub-LLM calls aren't guaranteed to be in the trajectory (and by default, they aren't), so we need to collect those metrics incrementally. I could do it afterward for the root-LLM but it save ~1 line of code, so it's probably not worth it.

snimu and others added 2 commits March 16, 2026 14:20
Move the tunnel/completion/timeout polling loop from get_prompt_messages
into _poll_next_request on CliAgentEnv. OpenCodeRLMEnv now calls this
helper instead of duplicating the loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
snimu and others added 4 commits March 16, 2026 15:33
… in rubric

- Extract _poll_next_request into CliAgentEnv so the RLM env reuses
  the polling loop instead of duplicating it
- Move main-agent metric computation from get_model_response override
  to the rubric (computed from trajectory at scoring time)
- Remove get_model_response override and _update_main_metrics
- Add cleanup handler to cancel in-flight sub-LLM tasks on rollout end

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ${OPENAI_MODEL} shell expansion with a fixed "intercepted/model"
provider/model pair, matching the opencode_harbor pattern. The model
name doesn't matter since all API calls go through the interception
proxy. This fixes the ProviderModelNotFoundError when users pass model
names without a provider/ prefix (e.g. gpt-5-mini instead of
openai/gpt-5-mini).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use self.logger instead of module-level logger
- Remove stale get_model_response override and _update_main_metrics
  (main metrics are now computed in the rubric from trajectory)
- Remove unused imports (logging, MessageType, SamplingArgs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
The get_model_response override and _update_main_metrics were removed
but the rubric was still reading main_* from state (always 0). Now
main_turns/main_prompt_tokens/main_completion_tokens are computed from
state["trajectory"] at scoring time. Sub-LLM metrics remain in state
(accumulated during rollout).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
Filter out trajectory steps with extras.is_sub_llm_call=True when
computing main_turns/main_prompt_tokens/main_completion_tokens.
Prevents double-counting when include_sub_llm_in_trajectory is enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
Comment thread verifiers/envs/experimental/opencode_rlm_env.py
snimu and others added 2 commits March 19, 2026 15:43
…king

When include_sub_llm_in_trajectory is enabled, sub-LLM steps can be
appended before the first main step, making len(trajectory) > 0. Use
has_main_step check instead so state["prompt"] is still set correctly
on the first main-agent turn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace pipe (cat | opencode run | tee) with redirect + cat so the
script exits with opencode's actual exit code. The pipe masked failures
because set -e only checks the last command in a pipeline (tee).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread verifiers/envs/experimental/opencode_rlm_env.py
set -e would exit the script before _oc_exit capture and log emission.
Temporarily disable with set +e, capture exit code, re-enable, then
cat logs and exit with the real code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@snimu snimu merged commit 5d84c1b into main Mar 19, 2026
6 checks passed
mikasenghaas added a commit that referenced this pull request Mar 19, 2026
…cli_agent_env

Revert opencode_env.py config builder to use shell variable expansion
instead of hardcoded "intercepted/model" (regression from #1023).

Move is_sub_llm_call-aware first-turn prompt check from CliAgentEnv
into OpenCodeRLMEnv where it belongs, restoring the simple
len(trajectory)==0 check in the base class.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mikasenghaas added a commit that referenced this pull request Mar 19, 2026
…cli_agent_env (#1042)

* fix: revert opencode_env config regression and move RLM logic out of cli_agent_env

Revert opencode_env.py config builder to use shell variable expansion
instead of hardcoded "intercepted/model" (regression from #1023).

Move is_sub_llm_call-aware first-turn prompt check from CliAgentEnv
into OpenCodeRLMEnv where it belongs, restoring the simple
len(trajectory)==0 check in the base class.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update test to match reverted shell variable expansion in opencode config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants