Conversation
OpenCodeRLMEnv extends OpenCodeEnv with the snimu/oc RLM plugin for recursive sub-LLM calls. Sets env vars so the plugin routes llm-subcall and subagent calls through the interception proxy with model="sub", enabling concurrent handling and separate token tracking. Includes opencode-rlm-test environment with 3 tasks exercising basic bash, llm-subcall, and subagent capabilities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers constructor defaults, config generation (including shell expansion), run command content, env var setup, sub-LLM detection, state initialization, metrics tracking, and monitor rubric. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SWE-Bench Docker images use sh (dash) as default shell, which doesn't support the bash-only `pipefail` option. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Fire-and-forget task may be garbage collected
- Added _sub_llm_tasks set to track task references and prevent garbage collection, following the pattern used in parent class.
- ✅ Fixed: New environment not listed in environments README
- Added opencode_rlm_test to environments/README.md under experimental environments section and pattern reference section.
- ✅ Fixed: Swallowing
CancelledErrorprevents proper task cancellation- Added raise statement after exception handling in _handle_sub_llm_request to properly propagate CancelledError and other exceptions.
Or push these changes by commenting:
@cursor push 362a510ff0
Preview (362a510ff0)
diff --git a/environments/README.md b/environments/README.md
--- a/environments/README.md
+++ b/environments/README.md
@@ -45,6 +45,9 @@
- **RLMEnv (Recursive Language Model)**
- **rlm_secrets**: Puzzle environment testing RLM functionality including root-level tools, sub-LLM tool use, and file operations.
+- **OpenCodeRLMEnv (OpenCode with RLM plugin)**
+ - **opencode_rlm_test**: Smoke-test environment for `OpenCodeRLMEnv` demonstrating concurrent sub-LLM handling with the RLM plugin.
+
- **HarborEnv / CliAgentEnv (CLI agent sandboxes)**
- **opencode_harbor**: Runs the OpenCode CLI agent on Harbor tasks with API interception via Prime Tunnel.
- **terminus_harbor**: Runs the Terminus agent on Harbor tasks with API interception via Prime Tunnel.
@@ -75,6 +78,7 @@
- **CLI agent sandboxes**: `opencode_harbor`, `terminus_harbor`
- **MCP integration**: `mcp_search_env`
- **RLM (recursive LLM)**: `rlm_secrets`
+- **OpenCode RLM integration**: `opencode_rlm_test`
- **Environment and rubric composition**: `math_group`, `math_python`, `wiki_search`
- **Procedural datasets**: `reasoning_gym_env`
- **Multimodal**: `mmmu`
diff --git a/verifiers/envs/experimental/opencode_rlm_env.py b/verifiers/envs/experimental/opencode_rlm_env.py
--- a/verifiers/envs/experimental/opencode_rlm_env.py
+++ b/verifiers/envs/experimental/opencode_rlm_env.py
@@ -167,6 +167,7 @@
self.sub_timeout_ms = sub_timeout_ms
self.include_sub_llm_in_trajectory = include_sub_llm_in_trajectory
self._sub_llm_semaphore = asyncio.Semaphore(max_sub_llm_parallelism)
+ self._sub_llm_tasks: set[asyncio.Task[None]] = set()
kwargs.setdefault("run_command_template", RLM_RUN_COMMAND_TEMPLATE)
@@ -304,9 +305,11 @@
if self._is_sub_llm_request(intercept):
# Fire-and-forget: handled concurrently outside the loop
- asyncio.create_task(
+ task = asyncio.create_task(
self._handle_sub_llm_request(state, request_id, intercept)
)
+ self._sub_llm_tasks.add(task)
+ task.add_done_callback(self._sub_llm_tasks.discard)
continue
# Main-agent request → return to rollout loop
@@ -349,6 +352,7 @@
except BaseException as e:
error = e
logger.warning("Sub-LLM request %s failed: %s", request_id, e)
+ raise
finally:
if intercept.get("stream"):
await synthesize_stream(intercept, response, error)This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Replaced by opencode-rlm-swe in research-environments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store asyncio.create_task references in state["_sub_llm_tasks"] set and use done callbacks to clean up. Prevents Python from silently dropping in-flight sub-LLM requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows CancelledError and KeyboardInterrupt to propagate for proper task cancellation during shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dler Catch BaseException to always resolve the HTTP future (preventing hangs), but re-raise non-Exception types (CancelledError, etc.) after delivery so task cancellation still propagates correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drain all pending sub-LLM tasks when the agent completes or times out, ensuring metrics and trajectory updates are finalized before scoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use `prompt` instead of `prompt_messages`, and include all required fields (completion, tokens, reward, advantage, is_truncated, trajectory_id) to match the TrajectoryStep TypedDict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace model-name substring matching with an explicit X-RLM-Role: sub HTTP header set by the OC plugin. The interception server now captures all request headers in the intercept dict for general-purpose use. Removes: RLM_SUB_MODEL_ID env var, sub_model_identifier param, RLM_LLM_SUBCALL_VIA_PROXY env var (llm-subcall now routes through OPENAI_BASE_URL automatically when set). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace model-name substring matching with an explicit X-RLM-Role: sub HTTP header set by the OC plugin. The interception server now captures all request headers (lowercased) for general-purpose use. Removes: - RLM_SUB_MODEL_ID env var and sub_model_identifier param - RLM_LLM_SUBCALL_VIA_PROXY env var (llm-subcall now routes through OPENAI_BASE_URL automatically when set) - Model-name substring matching Headers are stored with lowercase keys to handle HTTP/2 case normalization correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| self.include_sub_llm_in_trajectory = include_sub_llm_in_trajectory | ||
| self._sub_llm_semaphore = asyncio.Semaphore(max_sub_llm_parallelism) | ||
|
|
||
| kwargs.setdefault("run_command_template", RLM_RUN_COMMAND_TEMPLATE) |
There was a problem hiding this comment.
ah, i actually like this as a pattern to distinguish env args from this env or parent envs
| def _is_sub_llm_request(intercept: dict[str, Any]) -> bool: | ||
| return intercept.get("headers", {}).get("x-rlm-role") == "sub" | ||
|
|
||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
i dont like all these section dividers
| # Request routing | ||
| # ------------------------------------------------------------------ | ||
|
|
||
| async def get_prompt_messages(self, state: State) -> Messages: |
There was a problem hiding this comment.
maybe put the first part of this method (which i think is shared with cli agent env) in a shared util so we dont dup code here?
| ) | ||
| # Only count non-empty turns (skip the synthetic agent-completed step) | ||
| if prompt: | ||
| self._update_main_metrics(state, response) |
There was a problem hiding this comment.
what's the reason to compute metrics at rollout runtime as opposed to in the rubric reward fn?
There was a problem hiding this comment.
Copied it from the RLMEnv where some time ago I think I was planning to live-update metrics bc the rollouts were so long, but then didn't; I can undo it.
There was a problem hiding this comment.
oh wait there is a reason: the sub-LLM calls aren't guaranteed to be in the trajectory (and by default, they aren't), so we need to collect those metrics incrementally. I could do it afterward for the root-LLM but it save ~1 line of code, so it's probably not worth it.
Move the tunnel/completion/timeout polling loop from get_prompt_messages into _poll_next_request on CliAgentEnv. OpenCodeRLMEnv now calls this helper instead of duplicating the loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in rubric - Extract _poll_next_request into CliAgentEnv so the RLM env reuses the polling loop instead of duplicating it - Move main-agent metric computation from get_model_response override to the rubric (computed from trajectory at scoring time) - Remove get_model_response override and _update_main_metrics - Add cleanup handler to cancel in-flight sub-LLM tasks on rollout end Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ${OPENAI_MODEL} shell expansion with a fixed "intercepted/model"
provider/model pair, matching the opencode_harbor pattern. The model
name doesn't matter since all API calls go through the interception
proxy. This fixes the ProviderModelNotFoundError when users pass model
names without a provider/ prefix (e.g. gpt-5-mini instead of
openai/gpt-5-mini).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use self.logger instead of module-level logger - Remove stale get_model_response override and _update_main_metrics (main metrics are now computed in the rubric from trajectory) - Remove unused imports (logging, MessageType, SamplingArgs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The get_model_response override and _update_main_metrics were removed but the rubric was still reading main_* from state (always 0). Now main_turns/main_prompt_tokens/main_completion_tokens are computed from state["trajectory"] at scoring time. Sub-LLM metrics remain in state (accumulated during rollout). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filter out trajectory steps with extras.is_sub_llm_call=True when computing main_turns/main_prompt_tokens/main_completion_tokens. Prevents double-counting when include_sub_llm_in_trajectory is enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…king When include_sub_llm_in_trajectory is enabled, sub-LLM steps can be appended before the first main step, making len(trajectory) > 0. Use has_main_step check instead so state["prompt"] is still set correctly on the first main-agent turn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace pipe (cat | opencode run | tee) with redirect + cat so the script exits with opencode's actual exit code. The pipe masked failures because set -e only checks the last command in a pipeline (tee). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
set -e would exit the script before _oc_exit capture and log emission. Temporarily disable with set +e, capture exit code, re-enable, then cat logs and exit with the real code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cli_agent_env Revert opencode_env.py config builder to use shell variable expansion instead of hardcoded "intercepted/model" (regression from #1023). Move is_sub_llm_call-aware first-turn prompt check from CliAgentEnv into OpenCodeRLMEnv where it belongs, restoring the simple len(trajectory)==0 check in the base class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cli_agent_env (#1042) * fix: revert opencode_env config regression and move RLM logic out of cli_agent_env Revert opencode_env.py config builder to use shell variable expansion instead of hardcoded "intercepted/model" (regression from #1023). Move is_sub_llm_call-aware first-turn prompt check from CliAgentEnv into OpenCodeRLMEnv where it belongs, restoring the simple len(trajectory)==0 check in the base class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update test to match reverted shell variable expansion in opencode config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


Description
opencode_rlm_env
Type of Change
Testing
uv run pytestlocally.Checklist
Note
Medium Risk
Adds a new sandboxed agent environment that handles concurrent sub-LLM requests and changes interception/config plumbing; concurrency and request-routing changes could affect rollout stability and token accounting.
Overview
Introduces
OpenCodeRLMEnv, extendingOpenCodeEnvto support Recursive Language Model sub-agent calls via thesnimu/ocplugin, including sandbox bootstrapping (bun + plugin install), header-based sub-LLM detection, and concurrent handling of sub-requests with semaphore-limited parallelism and optional trajectory logging.Refactors
CliAgentEnvrequest polling into_poll_next_request()to support the new concurrent routing, adjusts first-turn prompt capture to ignore sub-LLM steps, and updatesOpenCodeEnv’s generated OpenCode config to use a fixedintercepted/modelprovider mapping.Enhances the interception layer to record incoming HTTP headers per request (used for sub-LLM role detection), adds a monitor rubric for main vs sub-LLM token/turn metrics, adds comprehensive tests for config rendering and metrics, and updates environment docs to list
OpenCodeEnv/OpenCodeRLMEnv.Written by Cursor Bugbot for commit 8bf7d0a. This will update automatically on new commits. Configure here.