Skip to content

feat: add nemotron tool preset (bash + str_replace aliases for Anthropic schema compatibility)#2554

Closed
juanmichelini wants to merge 2 commits intomainfrom
openhands/nemotron-tool-preset
Closed

feat: add nemotron tool preset (bash + str_replace aliases for Anthropic schema compatibility)#2554
juanmichelini wants to merge 2 commits intomainfrom
openhands/nemotron-tool-preset

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented Mar 24, 2026

Summary

This PR adds a new nemotron tool preset for Nemotron-3 Super (nvidia/nemotron-3-super-120b-a12b) which was fine-tuned on trajectories using Anthropic's tool schema. The preset exposes:

  • BashTool: A tool named bash (instead of terminal) that wraps TerminalExecutor
  • StrReplaceTool: A tool named str_replace (instead of file_editor) that wraps FileEditorExecutor

Problem

Two evaluation runs of nvidia/nemotron-3-super-120b-a12b showed a 63-67% conversation error rate, almost entirely caused by the model calling tool names that don't exist in OpenHands:

Tool name model called Should have called
str_replace file_editor (with command="str_replace")
bash terminal
command terminal
execute terminal

The model's behavior is correct for Anthropic's schema - it was trained on the str_replace_based_edit_tool / bash tool interface. The problem is a pure name mismatch.

Solution

Following the existing pattern (gemini.py, gpt5.py), this PR adds a nemotron preset that exposes tools under the names the model expects:

New files:

  • openhands-tools/openhands/tools/nemotron/bash/ - BashTool implementation
  • openhands-tools/openhands/tools/nemotron/str_replace/ - StrReplaceTool implementation
  • openhands-tools/openhands/tools/preset/nemotron.py - Preset configuration
  • tests/tools/nemotron/ - Test coverage for new tools and preset (21 tests)

CI integration:

  • Added nemotron to tool_preset dropdown in run-eval.yml workflow
  • Added nemotron to tool_preset dropdown in integration-runner.yml workflow
  • Updated ToolPresetType and get_tools_for_preset() in tests/integration/base.py
  • Updated argparse choices in tests/integration/run_infer.py

Exports added:

  • get_nemotron_agent, get_nemotron_tools from openhands.tools.preset

Usage

from openhands.tools.preset import get_nemotron_agent

agent = get_nemotron_agent(llm=llm)

Or using the tools directly:

from openhands.tools.nemotron import NEMOTRON_TOOLS, BashTool, StrReplaceTool

agent = Agent(
    llm=llm,
    tools=[*NEMOTRON_TOOLS, Tool(name=TaskTrackerTool.name)],
)

Testing via CI

To test the nemotron preset through the evaluation workflow:

  1. Navigate to Run Eval workflow
  2. Select the branch openhands/nemotron-tool-preset as sdk_ref (check 'Allow unreleased branches')
  3. Set tool_preset to nemotron
  4. Select appropriate model (e.g., a Nemotron model if available)

No additional PRs needed in evaluation or benchmarks repos - the tool_preset is passed through to the SDK.

Fixes #2553

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ebba9d4-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ebba9d4-python \
  ghcr.io/openhands/agent-server:ebba9d4-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ebba9d4-golang-amd64
ghcr.io/openhands/agent-server:ebba9d4-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ebba9d4-golang-arm64
ghcr.io/openhands/agent-server:ebba9d4-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ebba9d4-java-amd64
ghcr.io/openhands/agent-server:ebba9d4-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ebba9d4-java-arm64
ghcr.io/openhands/agent-server:ebba9d4-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ebba9d4-python-amd64
ghcr.io/openhands/agent-server:ebba9d4-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ebba9d4-python-arm64
ghcr.io/openhands/agent-server:ebba9d4-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ebba9d4-golang
ghcr.io/openhands/agent-server:ebba9d4-java
ghcr.io/openhands/agent-server:ebba9d4-python

About Multi-Architecture Support

  • Each variant tag (e.g., ebba9d4-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., ebba9d4-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/nemotron/bash
   definition.py926133%55–58, 60–61, 96, 100, 102–103, 105, 107–113, 115, 121, 123, 128, 130–132, 134, 136–140, 144–145, 148–150, 152–153, 155–158, 162–164, 169, 173–178, 180–181, 183, 225, 227–229, 231, 236
   impl.py574226%41, 48–50, 55–56, 58–64, 66–67, 69–72, 74, 76, 90–92, 99, 102, 105, 112, 115, 126–132, 135, 140–141, 143, 147–148
openhands-tools/openhands/tools/nemotron/str_replace
   definition.py663546%115, 117–119, 121–122, 124–127, 135–136, 141–142, 144–145, 147–148, 150–151, 153–155, 157, 219, 221, 223–225, 227–228, 235, 237–238, 245
   impl.py231439%31–32, 44–47, 58–59, 61, 71, 80–81, 84–85
openhands-tools/openhands/tools/preset
   nemotron.py322134%26–27, 29–31, 33–34, 36, 50, 52–53, 55, 60–61, 63–64, 69–70, 83, 86, 94
TOTAL217971098749% 

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

Need to retest (missed LiteLLM param)

@juanmichelini juanmichelini force-pushed the openhands/nemotron-tool-preset branch 2 times, most recently from be1e9f0 to 0b1429e Compare March 25, 2026 22:11
juanmichelini pushed a commit to OpenHands/benchmarks that referenced this pull request Mar 26, 2026
Add gpt5 and nemotron to:
- ToolPresetType literal in benchmarks/utils/models.py
- get_tools_for_preset() in benchmarks/swebench/run_infer.py
- get_tools_for_preset() in benchmarks/swebenchmultilingual/run_infer.py

This enables evaluations with:
- gpt5: uses apply_patch tool for file editing
- nemotron: uses bash/str_replace tools (Anthropic-compatible)

These presets are already supported in the software-agent-sdk but were
missing from the benchmarks implementation.

Related: OpenHands/software-agent-sdk#2554

Co-authored-by: openhands <openhands@all-hands.dev>
@juanmichelini juanmichelini force-pushed the openhands/nemotron-tool-preset branch from 0b1429e to c8f7f90 Compare March 27, 2026 14:48
juanmichelini pushed a commit to OpenHands/benchmarks that referenced this pull request Mar 27, 2026
Add gpt5 and nemotron to:
- ToolPresetType literal in benchmarks/utils/models.py
- get_tools_for_preset() in benchmarks/swebench/run_infer.py
- get_tools_for_preset() in benchmarks/swebenchmultilingual/run_infer.py

This enables evaluations with:
- gpt5: uses apply_patch tool for file editing
- nemotron: uses bash/str_replace tools (Anthropic-compatible)

These presets are already supported in the software-agent-sdk but were
missing from the benchmarks implementation.

Related: OpenHands/software-agent-sdk#2554

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juanmichelini Please see the discussion here: #2584 (comment)

I suggest we could think how we do this, first? Otherwise we’d have to re-eval if we change implementation deeply, I guess.

@juanmichelini juanmichelini force-pushed the openhands/nemotron-tool-preset branch from c8f7f90 to 6611469 Compare March 27, 2026 20:01
juanmichelini pushed a commit to OpenHands/benchmarks that referenced this pull request Mar 27, 2026
Add nemotron tool preset back to enable testing before merge.
This PR and OpenHands/software-agent-sdk#2554 will be merged
simultaneously or not at all.

Changes:
- Added 'nemotron' to ToolPresetType in benchmarks/utils/models.py
- Added nemotron case to get_tools_for_preset() in benchmarks/utils/tools.py
- Added test_get_tools_for_preset_nemotron() to tests/test_tools.py

Co-authored-by: openhands <openhands@all-hands.dev>
…pic schema compatibility)

Add a new 'nemotron' tool preset for Nemotron-3 Super (nvidia/nemotron-3-super-120b-a12b)
which was fine-tuned on trajectories using Anthropic's tool schema. The preset exposes:

- BashTool: A tool named 'bash' (instead of 'terminal') that wraps TerminalExecutor
- StrReplaceTool: A tool named 'str_replace' (instead of 'file_editor') that wraps
  FileEditorExecutor

This fixes the 63-67% conversation error rate observed in Nemotron evaluations,
caused entirely by tool name mismatches where the model called tools like 'bash',
'str_replace', 'command', 'execute' that don't exist in the default OpenHands schema.

New files:
- openhands-tools/openhands/tools/nemotron/bash/ - BashTool implementation
- openhands-tools/openhands/tools/nemotron/str_replace/ - StrReplaceTool implementation
- openhands-tools/openhands/tools/preset/nemotron.py - Preset configuration
- tests/tools/nemotron/ - Test coverage for new tools and preset

Exports added:
- get_nemotron_agent, get_nemotron_tools from openhands.tools.preset

Fixes #2553

Co-authored-by: openhands <openhands@all-hands.dev>
Add nemotron to:
- ToolPresetType literal in tests/integration/base.py
- get_tools_for_preset() function to return nemotron tools
- run-eval.yml workflow tool_preset dropdown
- integration-runner.yml workflow tool_preset dropdown
- run_infer.py argparse choices
- test_tool_presets.py for nemotron validation

This enables running evaluations with TOOL_PRESET=nemotron to test the
Nemotron-3 Super model with its native tool names (bash, str_replace).

Co-authored-by: openhands <openhands@all-hands.dev>
@juanmichelini
Copy link
Copy Markdown
Collaborator Author

@enyst thanks for the insights! Closing this, going for the #2684 approach instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add nemotron tool preset (bash + str_replace aliases for Anthropic schema compatibility)

3 participants