Skip to content

Draft: merge any_model: mip_and_realize_models#995

Merged
danielkorzekwa merged 47 commits intofeature/puzzletronfrom
dkorzekwa/mip_and_realize_models
Mar 13, 2026
Merged

Draft: merge any_model: mip_and_realize_models#995
danielkorzekwa merged 47 commits intofeature/puzzletronfrom
dkorzekwa/mip_and_realize_models

Conversation

@danielkorzekwa
Copy link

@danielkorzekwa danielkorzekwa commented Mar 6, 2026

What does this PR do?

Merging dkorzekwa/mip_and_realize_models into dkorzekwa/any_model_calc_one_block_scores - this MR is only for reviewing. Ultimately dkorzekwa/mip_and_realize_models should be merged into feature/puzzletron once dkorzekwa/any_model_calc_one_block_scores is merged there.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enabled model realization step during compression workflow after scoring phase completes.
  • Bug Fixes

    • Fixed key-value head calculation in attention configuration sourcing.
  • Tests

    • Strengthened validation checks for compression artifacts and output directories; added rank-aware assertions for model compression expectations.
  • Chores

    • Minor documentation formatting updates.

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 6, 2026 14:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

This PR modifies the Puzzletron model optimization pipeline by changing how key-value head counts are computed from block configuration, enabling the MIP realization step in the main pipeline, and replacing disabled test assertions with active validation checks for post-processing artifacts.

Changes

Cohort / File(s) Summary
Pipeline Logic
modelopt/torch/puzzletron/mip/run_puzzle.py, modelopt/torch/puzzletron/puzzletron.py
Updated key-value head computation to source directly from block_config.attention.num_key_value_heads instead of deriving from subblock args. Uncommented and enabled Step 6 MIP realization execution in the main pipeline workflow.
Validation and Testing
modelopt/torch/puzzletron/tools/validate_model.py, tests/gpu/torch/puzzletron/test_puzzletron.py
Minor docstring formatting in validate_model.py. Replaced disabled test assertions with active rank-aware validation checks that verify pruning files, checkpoint directories, MIP solution artifacts, loss metrics, build artifacts, and scoring results.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Draft: merge any_model: mip_and_realize_models' is misleading. It describes a merge operation into an intermediate branch, but the actual changes enable MIP model realization and fix pruning validation logic—not a merge operation. Use a title that reflects the substantive changes: something like 'Enable MIP model realization and enhance pruning validation' or 'Implement MIP and realize models with improved test coverage'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed The modifications in run_puzzle.py, puzzletron.py, and validate_model.py only adjust internal logic and uncomment a function call, introducing no new torch.load calls, numpy.load usage, hardcoded trust_remote_code flags, eval/exec statements, or "# nosec" comments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dkorzekwa/mip_and_realize_models
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Base automatically changed from dkorzekwa/any_model_calc_one_block_scores to feature/puzzletron March 12, 2026 23:37
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 12, 2026 23:37
@danielkorzekwa danielkorzekwa requested a review from realAsma March 12, 2026 23:37
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/gpu/torch/puzzletron/test_puzzletron.py (1)

144-152: Consider defensive handling for missing directory.

If mip_solutions_dir doesn't exist (e.g., due to an earlier step failure), iterdir() will raise FileNotFoundError. A more explicit check could improve debuggability.

Optional: Add existence check before iteration
             # assertions for the mip_and_realize_models step 6
             # Find the MIP solution directory dynamically (e.g., stats_num_local_experts_*)
             mip_solutions_dir = puzzle_dir / "mip/puzzle_solutions"
+            assert mip_solutions_dir.exists(), (
+                f"MIP solutions directory not found: {mip_solutions_dir}"
+            )
             solution_dirs = [
                 d
                 for d in mip_solutions_dir.iterdir()
                 if d.is_dir() and d.name.startswith("stats_num_local_experts_")
             ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/puzzletron/test_puzzletron.py` around lines 144 - 152, The
code assumes mip_solutions_dir exists before calling
mip_solutions_dir.iterdir(); add a defensive existence check (e.g., if not
mip_solutions_dir.exists() or not mip_solutions_dir.is_dir()) before iterating
and raise/assert with a clear error message that includes puzzle_dir and
mip_solutions_dir when missing, or set solution_dirs to an empty list and fail
the subsequent assertion with that informative message; update the block around
mip_solutions_dir, solution_dirs and the following assert to use this check so
FileNotFoundError is avoided and debugging is easier.
modelopt/torch/puzzletron/puzzletron.py (1)

65-74: Minor: Step numbering skips Step 3.

The comments indicate Steps 0, 1, 2, 4, 5, 6 — Step 3 is missing. This might be intentional (perhaps a removed step), but it could cause confusion when referencing steps in logs or documentation.

Consider renumbering for clarity
-    # Step 4: build_library_and_stats (single process)
+    # Step 3: build_library_and_stats (single process)
     if dist.is_master():
         build_library_and_stats.launch_build_library_and_stats(hydra_cfg)
     dist.barrier()

-    # Step 5: calc_one_block_scores (distributed processing)
+    # Step 4: calc_one_block_scores (distributed processing)
     scoring.launch_scoring(hydra_cfg)

-    # Step 6: mip_and_realize_models (distributed processing)
+    # Step 5: mip_and_realize_models (distributed processing)
     mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron.py` around lines 65 - 74, The step
comments in puzzletron.py skip Step 3 which is confusing; update the comments
around the sequence that includes dist.is_master(),
build_library_and_stats.launch_build_library_and_stats, dist.barrier(),
scoring.launch_scoring, and mip_and_realize_models.launch_mip_and_realize_model
to either renumber them sequentially (e.g., change "Step 4"→"Step 3", "Step
5"→"Step 4", "Step 6"→"Step 5") or add a brief comment like "Step 3 removed
intentionally" to make the gap explicit and avoid confusion when referencing
steps in logs/docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/puzzletron/puzzletron.py`:
- Around line 65-74: The step comments in puzzletron.py skip Step 3 which is
confusing; update the comments around the sequence that includes
dist.is_master(), build_library_and_stats.launch_build_library_and_stats,
dist.barrier(), scoring.launch_scoring, and
mip_and_realize_models.launch_mip_and_realize_model to either renumber them
sequentially (e.g., change "Step 4"→"Step 3", "Step 5"→"Step 4", "Step 6"→"Step
5") or add a brief comment like "Step 3 removed intentionally" to make the gap
explicit and avoid confusion when referencing steps in logs/docs.

In `@tests/gpu/torch/puzzletron/test_puzzletron.py`:
- Around line 144-152: The code assumes mip_solutions_dir exists before calling
mip_solutions_dir.iterdir(); add a defensive existence check (e.g., if not
mip_solutions_dir.exists() or not mip_solutions_dir.is_dir()) before iterating
and raise/assert with a clear error message that includes puzzle_dir and
mip_solutions_dir when missing, or set solution_dirs to an empty list and fail
the subsequent assertion with that informative message; update the block around
mip_solutions_dir, solution_dirs and the following assert to use this check so
FileNotFoundError is avoided and debugging is easier.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ed33a66-d1fb-4928-ad4d-611f6fb09e49

📥 Commits

Reviewing files that changed from the base of the PR and between eb4b210 and 47b4479.

📒 Files selected for processing (4)
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/puzzletron.py
  • modelopt/torch/puzzletron/tools/validate_model.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
💤 Files with no reviewable changes (1)
  • modelopt/torch/puzzletron/tools/validate_model.py

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (eb4b210) to head (1c1e983).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron     #995      +/-   ##
======================================================
- Coverage               72.13%   72.12%   -0.02%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
- Hits                    17045    17042       -3     
- Misses                   6583     6586       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa merged commit 8fe318d into feature/puzzletron Mar 13, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/mip_and_realize_models branch March 13, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants