Draft: merge any_model: mip_and_realize_models#995
Draft: merge any_model: mip_and_realize_models#995danielkorzekwa merged 47 commits intofeature/puzzletronfrom
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
📝 WalkthroughWalkthroughThis PR modifies the Puzzletron model optimization pipeline by changing how key-value head counts are computed from block configuration, enabling the MIP realization step in the main pipeline, and replacing disabled test assertions with active validation checks for post-processing artifacts. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ld_library_and_stats
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…kwa/any_model_calc_one_block_scores
…wa/mip_and_realize_models
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ld_library_and_stats
…kwa/any_model_calc_one_block_scores
…wa/mip_and_realize_models
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/gpu/torch/puzzletron/test_puzzletron.py (1)
144-152: Consider defensive handling for missing directory.If
mip_solutions_dirdoesn't exist (e.g., due to an earlier step failure),iterdir()will raiseFileNotFoundError. A more explicit check could improve debuggability.Optional: Add existence check before iteration
# assertions for the mip_and_realize_models step 6 # Find the MIP solution directory dynamically (e.g., stats_num_local_experts_*) mip_solutions_dir = puzzle_dir / "mip/puzzle_solutions" + assert mip_solutions_dir.exists(), ( + f"MIP solutions directory not found: {mip_solutions_dir}" + ) solution_dirs = [ d for d in mip_solutions_dir.iterdir() if d.is_dir() and d.name.startswith("stats_num_local_experts_") ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/puzzletron/test_puzzletron.py` around lines 144 - 152, The code assumes mip_solutions_dir exists before calling mip_solutions_dir.iterdir(); add a defensive existence check (e.g., if not mip_solutions_dir.exists() or not mip_solutions_dir.is_dir()) before iterating and raise/assert with a clear error message that includes puzzle_dir and mip_solutions_dir when missing, or set solution_dirs to an empty list and fail the subsequent assertion with that informative message; update the block around mip_solutions_dir, solution_dirs and the following assert to use this check so FileNotFoundError is avoided and debugging is easier.modelopt/torch/puzzletron/puzzletron.py (1)
65-74: Minor: Step numbering skips Step 3.The comments indicate Steps 0, 1, 2, 4, 5, 6 — Step 3 is missing. This might be intentional (perhaps a removed step), but it could cause confusion when referencing steps in logs or documentation.
Consider renumbering for clarity
- # Step 4: build_library_and_stats (single process) + # Step 3: build_library_and_stats (single process) if dist.is_master(): build_library_and_stats.launch_build_library_and_stats(hydra_cfg) dist.barrier() - # Step 5: calc_one_block_scores (distributed processing) + # Step 4: calc_one_block_scores (distributed processing) scoring.launch_scoring(hydra_cfg) - # Step 6: mip_and_realize_models (distributed processing) + # Step 5: mip_and_realize_models (distributed processing) mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron.py` around lines 65 - 74, The step comments in puzzletron.py skip Step 3 which is confusing; update the comments around the sequence that includes dist.is_master(), build_library_and_stats.launch_build_library_and_stats, dist.barrier(), scoring.launch_scoring, and mip_and_realize_models.launch_mip_and_realize_model to either renumber them sequentially (e.g., change "Step 4"→"Step 3", "Step 5"→"Step 4", "Step 6"→"Step 5") or add a brief comment like "Step 3 removed intentionally" to make the gap explicit and avoid confusion when referencing steps in logs/docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/puzzletron/puzzletron.py`:
- Around line 65-74: The step comments in puzzletron.py skip Step 3 which is
confusing; update the comments around the sequence that includes
dist.is_master(), build_library_and_stats.launch_build_library_and_stats,
dist.barrier(), scoring.launch_scoring, and
mip_and_realize_models.launch_mip_and_realize_model to either renumber them
sequentially (e.g., change "Step 4"→"Step 3", "Step 5"→"Step 4", "Step 6"→"Step
5") or add a brief comment like "Step 3 removed intentionally" to make the gap
explicit and avoid confusion when referencing steps in logs/docs.
In `@tests/gpu/torch/puzzletron/test_puzzletron.py`:
- Around line 144-152: The code assumes mip_solutions_dir exists before calling
mip_solutions_dir.iterdir(); add a defensive existence check (e.g., if not
mip_solutions_dir.exists() or not mip_solutions_dir.is_dir()) before iterating
and raise/assert with a clear error message that includes puzzle_dir and
mip_solutions_dir when missing, or set solution_dirs to an empty list and fail
the subsequent assertion with that informative message; update the block around
mip_solutions_dir, solution_dirs and the following assert to use this check so
FileNotFoundError is avoided and debugging is easier.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7ed33a66-d1fb-4928-ad4d-611f6fb09e49
📒 Files selected for processing (4)
modelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/puzzletron.pymodelopt/torch/puzzletron/tools/validate_model.pytests/gpu/torch/puzzletron/test_puzzletron.py
💤 Files with no reviewable changes (1)
- modelopt/torch/puzzletron/tools/validate_model.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #995 +/- ##
======================================================
- Coverage 72.13% 72.12% -0.02%
======================================================
Files 209 209
Lines 23628 23628
======================================================
- Hits 17045 17042 -3
- Misses 6583 6586 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Merging dkorzekwa/mip_and_realize_models into dkorzekwa/any_model_calc_one_block_scores - this MR is only for reviewing. Ultimately dkorzekwa/mip_and_realize_models should be merged into feature/puzzletron once dkorzekwa/any_model_calc_one_block_scores is merged there.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
Chores