feat: Add VeriHop multimodal multi-hop environment#1049
feat: Add VeriHop multimodal multi-hop environment#1049nevasini1 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| (or publish to the hub). Use `name = "verihop"` / your env id and pass `use_tools` via the | ||
| environment args your runner supports. | ||
|
|
||
| See `examples/train_with_prime_rl.py` for a commented template. |
There was a problem hiding this comment.
New environment not added to environments README
Medium Severity
This PR adds a new verihop environment to environments/ but does not update environments/README.md to list it. The rule requires that any PR adding or removing an environment from the environments/ folder must update environments/README.md to reflect the change, including listing it under the appropriate category/pattern section.
Triggered by project rule: BugBot Instructions
| from .types import DatasetBuilder # noqa # isort: skip | ||
| from .parsers.parser import Parser # noqa # isort: skip | ||
| from .rubrics.rubric import Rubric # noqa # isort: skip | ||
| from .messages import add_image |
There was a problem hiding this comment.
Core add_image API missing from reference docs
Low Severity
add_image is added as a new core user-facing function exported from verifiers/__init__.py and __all__, but docs/reference.md is not updated to document it. The rule requires that PRs adding core user-facing functionality update the relevant documentation, including docs/reference.md.
Additional Locations (1)
Triggered by project rule: BugBot Instructions
|
|
||
| def _norm_num(s: str) -> str: | ||
| m = re.search(r"-?\d+", s) | ||
| return m.group(0) if m else s.strip().lower() |
There was a problem hiding this comment.
_norm_num string comparison fails on leading zeros
Low Severity
_norm_num extracts the first digit sequence via regex and compares as a raw string, so numerically equivalent values like "07" and "7" are treated as unequal. If a model produces \boxed{07} or <hop_answer>07</hop_answer>, both outcome_reward and process_reward would incorrectly score as 0.0. Converting through int() (e.g., str(int(m.group(0)))) would fix this.
- Add verifiers.messages.add_image for OpenAI-style image_url parts - New environments/verihop: procedural scene synthesis, VeriHopEnv, VeriHopToolEnv (StatefulToolEnv + hop advancement on text turns), VeriHopRubric (outcome + per-hop process), PIL tool helpers - Docs and pytest coverage; pytest pythonpath for local verihop imports Made-with: Cursor
af6fe29 to
6e48b1b
Compare


Why this PR
Verifiers already had solid text RLVR patterns and a basic multimodal path (e.g. single-turn MMMU-style prompts), but there was no first-class example of multi-hop visual reasoning: same image, dependent questions, optional tool use, and rewards that can reflect process (per-hop answers and grounding) as well as a final verifiable outcome.
VeriHop is that reference environment. It is meant to be easy to extend (more hop types, real image sources, curriculum) and to train the kinds of habits papers like HopChain emphasize—re-grounding, dependency across steps, long CoT stability—without tying the repo to a single benchmark or a brittle port.
What you get
add_image) so environment authors can append OpenAI-style image blocks to user messages without duplicating MMMU-style base64 boilerplate.environments/verihop: procedural scenes, a fixed 3-hop task family (count → count → combine), two rollout modes (plain multi-turn vs tool-augmented), and a rubric that can weight final boxed answer vs per-hop behavior (including optional grounding tags).Intent for reviewers
This is intentionally a vertical slice: enough to run real rollouts and iterate, not a claim of completeness. Follow-ups could add hub listing, reference docs, more image sources, variable hop counts, or splitting the core
add_imagechange into its own PR if you prefer a minimal merge.Notes
Happy to adjust scope, naming, or documentation to match how you want multimodal RLVR positioned in the project.