Add Full TE Spec support for Megatron Pruning DynamicModules + MoE bug fixes#1024
Add Full TE Spec support for Megatron Pruning DynamicModules + MoE bug fixes#1024kevalmorabia97 wants to merge 6 commits intomainfrom
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThis PR migrates the NVIDIA Model Optimizer codebase to use full Transformer Engine specifications for Minitron pruning. The changes consolidate TE module integrations from a dedicated plugin into the Megatron plugin, introduce a new factory function for TE-compatible Mamba stacks, update activation collection logic to handle TE's fused layer normalization outputs, and align test utilities and test suites to exercise the TE implementation paths. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1024 +/- ##
==========================================
+ Coverage 70.09% 70.12% +0.03%
==========================================
Files 221 221
Lines 25459 25459
==========================================
+ Hits 17845 17854 +9
+ Misses 7614 7605 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
ade6edf to
d4820c8
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
d4820c8 to
98d5291
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
f3071a3 to
8f42e0f
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
8f42e0f to
cff7137
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
Type of change: Improvement
Quantization recently added support for Full TE spec. Adding same for Pruning as well so we can retire ModelOpt spec and just use standard TE spec.
NOTE: We still dont support TEGroupedGemm and instead use TE SequentialMLP for now (but this can be configured in standard TE Spec so we dont need modelopt spec)
Note that this does not affect the usage of the pruning workflow but makes pruning slightly faster and may result in slightly different pruned model because of different kernel and numerics.
[Bug fix]: Previously NAS-based pruning for MoE models would hang when evaluating MMLU for pruned candidate models because of a bug. Fixed in this PR as well
[Bug fix]: Previously hidden size importance hooks were not applied to pre_mlp_layernorm for MoE layers. Fixed in this PR as well resulting in a significant improvement in MMLU for Qwen3-30B-A3B
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
OMNIML-3504
Summary by CodeRabbit
Release Notes
New Features
Documentation
Improvements