fix: pass missing gts argument to _dump_generations call#528
Open
Alexi5000 wants to merge 1 commit into
Open
Conversation
The `RayPPOTrainer._dump_generations()` method requires a `gts` (ground truths) positional argument, but both `AgentLightningTrainer._train_step` and `EnvAgentLightningTrainer._train_step` omit it, causing a TypeError at runtime when `rollout_data_dir` is configured. Pass `gts=None` since ground truth is not available in agent mode training. Also remove a leftover `print(batch.batch.keys())` debug statement from both call sites. Fixes microsoft#492
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR aligns rollout generation dumping behavior across the two VERL trainer implementations by removing a stray debug print and passing an explicit gts argument into _dump_generations.
Changes:
- Removed
print(batch.batch.keys())debug output during rollout dumping. - Added
gts=Noneto_dump_generations(...)calls in both trainer implementations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| contrib/agentlightning/contrib/algorithm/env_verl/trainer.py | Removes debug print and adds gts=None when dumping generations. |
| agentlightning/verl/trainer.py | Mirrors the same rollout dumping change for consistency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
420
to
426
| self._dump_generations( | ||
| inputs=inputs, | ||
| outputs=outputs, | ||
| gts=None, | ||
| scores=scores, | ||
| reward_extra_infos_dict=reward_extra_infos_dict, | ||
| dump_path=rollout_data_dir, |
Comment on lines
520
to
526
| self._dump_generations( | ||
| inputs=inputs, | ||
| outputs=outputs, | ||
| gts=None, | ||
| scores=scores, | ||
| reward_extra_infos_dict=reward_extra_infos_dict, | ||
| dump_path=rollout_data_dir, |
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gts=Noneargument to_dump_generations()in bothAgentLightningTrainer._train_stepandEnvAgentLightningTrainer._train_stepprint(batch.batch.keys())debug statement from both call sitesProblem
RayPPOTrainer._dump_generations()requires agts(ground truths) positional argument, but both trainer subclasses omit it. This causes aTypeErrorat runtime whenrollout_data_diris configured:TypeError: RayPPOTrainer._dump_generations() missing 1 required positional argument: 'gts'Ground truth is not available in agent mode training, so
Noneis the correct value.Fixes #492
Files Changed
agentlightning/verl/trainer.py—AgentLightningTrainer._train_stepcontrib/agentlightning/contrib/algorithm/env_verl/trainer.py—EnvAgentLightningTrainer._train_stepTest plan
rollout_data_dirconfigured — noTypeErrorrollout_data_dir— no behavior change_dump_generationshandlesgts=Nonegracefully (base class already supports optional ground truths)