Skip to content

feat: Add VeriHop multimodal multi-hop environment#1049

Open
nevasini1 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
nevasini1:feature/verihop-multimodal-env
Open

feat: Add VeriHop multimodal multi-hop environment#1049
nevasini1 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
nevasini1:feature/verihop-multimodal-env

Conversation

@nevasini1
Copy link
Copy Markdown

@nevasini1 nevasini1 commented Mar 21, 2026

Why this PR

Verifiers already had solid text RLVR patterns and a basic multimodal path (e.g. single-turn MMMU-style prompts), but there was no first-class example of multi-hop visual reasoning: same image, dependent questions, optional tool use, and rewards that can reflect process (per-hop answers and grounding) as well as a final verifiable outcome.

VeriHop is that reference environment. It is meant to be easy to extend (more hop types, real image sources, curriculum) and to train the kinds of habits papers like HopChain emphasize—re-grounding, dependency across steps, long CoT stability—without tying the repo to a single benchmark or a brittle port.

What you get

  • A small core helper (add_image) so environment authors can append OpenAI-style image blocks to user messages without duplicating MMMU-style base64 boilerplate.
  • A packaged environment under environments/verihop: procedural scenes, a fixed 3-hop task family (count → count → combine), two rollout modes (plain multi-turn vs tool-augmented), and a rubric that can weight final boxed answer vs per-hop behavior (including optional grounding tags).
  • Docs, tests, and pytest wiring so the env is discoverable and CI can import it like other first-party envs.

Intent for reviewers

This is intentionally a vertical slice: enough to run real rollouts and iterate, not a claim of completeness. Follow-ups could add hub listing, reference docs, more image sources, variable hop counts, or splitting the core add_image change into its own PR if you prefer a minimal merge.

Notes

Happy to adjust scope, naming, or documentation to match how you want multimodal RLVR positioned in the project.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

(or publish to the hub). Use `name = "verihop"` / your env id and pass `use_tools` via the
environment args your runner supports.

See `examples/train_with_prime_rl.py` for a commented template.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New environment not added to environments README

Medium Severity

This PR adds a new verihop environment to environments/ but does not update environments/README.md to list it. The rule requires that any PR adding or removing an environment from the environments/ folder must update environments/README.md to reflect the change, including listing it under the appropriate category/pattern section.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Comment thread verifiers/__init__.py
from .types import DatasetBuilder # noqa # isort: skip
from .parsers.parser import Parser # noqa # isort: skip
from .rubrics.rubric import Rubric # noqa # isort: skip
from .messages import add_image
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core add_image API missing from reference docs

Low Severity

add_image is added as a new core user-facing function exported from verifiers/__init__.py and __all__, but docs/reference.md is not updated to document it. The rule requires that PRs adding core user-facing functionality update the relevant documentation, including docs/reference.md.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions


def _norm_num(s: str) -> str:
m = re.search(r"-?\d+", s)
return m.group(0) if m else s.strip().lower()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_norm_num string comparison fails on leading zeros

Low Severity

_norm_num extracts the first digit sequence via regex and compares as a raw string, so numerically equivalent values like "07" and "7" are treated as unequal. If a model produces \boxed{07} or <hop_answer>07</hop_answer>, both outcome_reward and process_reward would incorrectly score as 0.0. Converting through int() (e.g., str(int(m.group(0)))) would fix this.

Fix in Cursor Fix in Web

- Add verifiers.messages.add_image for OpenAI-style image_url parts
- New environments/verihop: procedural scene synthesis, VeriHopEnv,
  VeriHopToolEnv (StatefulToolEnv + hop advancement on text turns),
  VeriHopRubric (outcome + per-hop process), PIL tool helpers
- Docs and pytest coverage; pytest pythonpath for local verihop imports

Made-with: Cursor
@nevasini1 nevasini1 force-pushed the feature/verihop-multimodal-env branch from af6fe29 to 6e48b1b Compare March 21, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant