[skyrl-train] Flip grpo_norm_by_std default to false by taivu1998 · Pull Request #1443 · NovaSky-AI/SkyRL

taivu1998 · 2026-04-02T23:08:33Z

Summary

This PR updates the default value of trainer.algorithm.grpo_norm_by_std from true to false across the canonical config and utility surfaces.

It keeps the existing trainer plumbing intact, aligns the config docs with the new default, and adds regression coverage so the repo now explicitly protects:

the typed config default
the legacy YAML translation path
the GRPO utility default behavior
the trainer forwarding behavior for both default and explicit override cases

Closes #869.

What changed

Flip AlgorithmConfig.grpo_norm_by_std to False
Flip the legacy/base YAML default to false
Flip the GRPO utility defaults in ppo_utils.py to False
Update the configuration docs and migration wording
Remove stale hardcoded grpo_norm_by_std=True from shared test helpers
Add targeted tests for:
- default behavior equals explicit false
- explicit true
- explicit false
- trainer pass-through behavior

Why this approach

The runtime already forwarded self.cfg.trainer.algorithm.grpo_norm_by_std correctly, so this change stays intentionally small and focused on the default surfaces rather than rewriting trainer logic.

It also avoids broad example churn: examples that already pin false remain explicit, while other flows inherit the new default naturally.

Testing

UV_CACHE_DIR=/tmp/uv-cache uv run --isolated --extra skyrl-train --extra dev pytest -q \
  tests/train/test_config.py \
  tests/train/test_trainer.py \
  tests/backends/skyrl_train/utils/test_ppo_utils.py

UV_CACHE_DIR=/tmp/uv-cache uv run --isolated --extra skyrl-train --extra dev pytest -q \
  tests/train/test_eval.py \
  tests/train/test_trainer_utils.py

gemini-code-assist

Code Review

This pull request updates the default value of grpo_norm_by_std from true to false across the codebase, including the AlgorithmConfig class, PPO utility functions, and the base YAML configuration. The documentation has been updated to reflect this change and provide instructions for recovering the previous behavior. Additionally, new unit tests have been added to verify the new default setting and ensure that explicit overrides are respected. I have no feedback to provide.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Flip GRPO std normalization default to false

3f7be4f

taivu1998 closed this Apr 3, 2026

taivu1998 reopened this Apr 4, 2026

Skip SkyRL GPU workflow without Anyscale token

e3d03cf

taivu1998 marked this pull request as ready for review April 5, 2026 00:07

gemini-code-assist bot reviewed Apr 5, 2026

View reviewed changes

devin-ai-integration bot reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train] Flip grpo_norm_by_std default to false#1443

[skyrl-train] Flip grpo_norm_by_std default to false#1443
taivu1998 wants to merge 2 commits intoNovaSky-AI:mainfrom
taivu1998:tdv/issue-869-grpo-default

taivu1998 commented Apr 2, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taivu1998 commented Apr 2, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why this approach

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

taivu1998 commented Apr 2, 2026 •

edited by devin-ai-integration bot

Loading