Skip to content

[skyrl-train] Flip grpo_norm_by_std default to false#1443

Open
taivu1998 wants to merge 2 commits intoNovaSky-AI:mainfrom
taivu1998:tdv/issue-869-grpo-default
Open

[skyrl-train] Flip grpo_norm_by_std default to false#1443
taivu1998 wants to merge 2 commits intoNovaSky-AI:mainfrom
taivu1998:tdv/issue-869-grpo-default

Conversation

@taivu1998
Copy link
Copy Markdown

@taivu1998 taivu1998 commented Apr 2, 2026

Summary

This PR updates the default value of trainer.algorithm.grpo_norm_by_std from true to false across the canonical config and utility surfaces.

It keeps the existing trainer plumbing intact, aligns the config docs with the new default, and adds regression coverage so the repo now explicitly protects:

  • the typed config default
  • the legacy YAML translation path
  • the GRPO utility default behavior
  • the trainer forwarding behavior for both default and explicit override cases

Closes #869.

What changed

  • Flip AlgorithmConfig.grpo_norm_by_std to False
  • Flip the legacy/base YAML default to false
  • Flip the GRPO utility defaults in ppo_utils.py to False
  • Update the configuration docs and migration wording
  • Remove stale hardcoded grpo_norm_by_std=True from shared test helpers
  • Add targeted tests for:
    • default behavior equals explicit false
    • explicit true
    • explicit false
    • trainer pass-through behavior

Why this approach

The runtime already forwarded self.cfg.trainer.algorithm.grpo_norm_by_std correctly, so this change stays intentionally small and focused on the default surfaces rather than rewriting trainer logic.

It also avoids broad example churn: examples that already pin false remain explicit, while other flows inherit the new default naturally.

Testing

UV_CACHE_DIR=/tmp/uv-cache uv run --isolated --extra skyrl-train --extra dev pytest -q \
  tests/train/test_config.py \
  tests/train/test_trainer.py \
  tests/backends/skyrl_train/utils/test_ppo_utils.py
UV_CACHE_DIR=/tmp/uv-cache uv run --isolated --extra skyrl-train --extra dev pytest -q \
  tests/train/test_eval.py \
  tests/train/test_trainer_utils.py

Open with Devin

@taivu1998 taivu1998 closed this Apr 3, 2026
@taivu1998 taivu1998 reopened this Apr 4, 2026
@taivu1998 taivu1998 marked this pull request as ready for review April 5, 2026 00:07
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the default value of grpo_norm_by_std from true to false across the codebase, including the AlgorithmConfig class, PPO utility functions, and the base YAML configuration. The documentation has been updated to reflect this change and provide instructions for recovering the previous behavior. Additionally, new unit tests have been added to verify the new default setting and ensure that explicit overrides are respected. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[skyrl-train] consider updating grpo_norm_by_std default to false

1 participant