Skip to content

ModelOpt Framework, Recipe Lib, converting subset of existing recipes 1/N#1000

Open
shengliangxu wants to merge 4 commits intomainfrom
shengliangx/modeopt-recipe-lib-mvp
Open

ModelOpt Framework, Recipe Lib, converting subset of existing recipes 1/N#1000
shengliangxu wants to merge 4 commits intomainfrom
shengliangx/modeopt-recipe-lib-mvp

Conversation

@shengliangxu
Copy link
Contributor

@shengliangxu shengliangxu commented Mar 6, 2026

What does this PR do?

  1. start a new config system using yaml/yml files. The config system adopts Hydra style override, using the __base__ tag, which matches pydantic's __base__ model mechnism. __base__ is a list, it semantically is inheritance. The implementation is using OmegaConf to merge configs, by the order in the __base__ list.

  2. add a new top level package: modelopt_recipes

    I want it to be a top level package so we can make it clear that the modelopt package holds the code, this new package holds recipes

  3. make a list of reusable quant config units, sitting at modelopt_recipes/configs/ptq/.... Mostly 3 types:

    2.1 model weights and activations quant config 2.2 kv quant config 2.3 algorithm config

    with the base config mechnism, we can have this much clearer config system

  4. implement some of the existing quantization recipes using the new config system as model agnostic general recipes, but not actually in use. these recipes sit inside modelopt_recipes/general/ptq/...

  5. make sure the configs from the new config system match the exisiting configs

  6. extend the hf_ptq script to enable recipe based PTQ

  7. testted hf_ptq using both builtin and extenal config file. example script:

Usage

   python examples/llm_ptq/hf_ptq.py         \
     --model Qwen/Qwen3-8B                   \
     --recipe general/ptq/fp8_default-fp8_kv \
     ...

Testing

   python examples/llm_ptq/hf_ptq.py         \
     --model Qwen/Qwen3-8B                   \
     --recipe general/ptq/fp8_default-fp8_kv \
     --export_path=fp8_default-fp8_kv        \
     --calib_size=16                         \
     --batch_size=0                          \
     --trust_remote_code                     \
     --export_fmt=hf

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Recipe-driven PTQ workflows via YAML recipes and new recipe loader; CLI gains a --recipe option and --pyt_ckpt_path renamed to --model.
    • Many new PTQ recipe and config presets (FP8, INT4/INT8, NVFP4, MXFPx, KV-cache variants) and improved runtime config loading/merging.
  • Documentation

    • Added READMEs describing recipe/config layout.
  • Tests

    • New unit tests covering config loading, inheritance and recipe loading.
  • Chores

    • Added YAML/OmegaConf runtime support and packaging of recipe YAMLs.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a recipe-driven PTQ system: new recipe models and loader, YAML-based config loading/merging with base inheritance, many PTQ recipe/config YAMLs, example CLI support for recipes, and related refactors to quant_cfg handling across examples and quantization utilities.

Changes

Cohort / File(s) Summary
Recipe core
modelopt/torch/recipe/__init__.py, modelopt/torch/recipe/config.py, modelopt/torch/recipe/loader.py
New package exporting recipe API, pydantic recipe models (RecipeType, ModelOptPTQRecipe), and loader functions including load_recipe supporting merged YAML files and legacy directory-format recipes.
Config loader & infra
modelopt/torch/opt/config.py, modelopt/torch/quantization/config.py
Added load_config with YAML loading and recursive base merging (OmegaConf), BUILTIN_RECIPES_LIB constant; replaced several hardcoded quantizer defaults with runtime-loaded configs; extended QuantizerAttributeConfig types and validators.
Recipes & configs (many YAMLs)
modelopt_recipes/**/configs/ptq/*.yml, modelopt_recipes/general/ptq/*.yml, modelopt_recipes/configs/README, modelopt_recipes/general/README
Added large set of PTQ config presets and composed recipes (calibration algorithms, KV quant configs, weight/input presets, and many named recipe aggregates) and READMEs.
Example scripts & utils
examples/llm_ptq/example_utils.py, examples/llm_ptq/hf_ptq.py, examples/llm_ptq/multinode_ptq.py
Refactored build_quant_cfg signature to accept a precomputed quant_cfg; removed kv-cache qformat paths from example_utils; added recipe-driven quantization branch in hf_ptq (new --recipe CLI option), renamed --pyt_ckpt_path--model.
KV cache quant integration
modelopt/torch/quantization/utils.py, examples/llm_ptq/*, examples/llm_ptq/*
Added deepcopy in update_quant_cfg_with_kv_cache_quant to avoid mutating caller state; examples now update quant_cfg with KV quant from recipes or KV presets rather than separate kv_qformat flows.
Packaging / resources
modelopt_recipes/__init__.py, pyproject.toml
New modelopt_recipes package exposing version, added PyYAML and omegaconf dependencies, and included YAML package-data for modelopt_recipes.
Tests
tests/unit/torch/recipe/test_loader.py, tests/_test_utils/torch/quantization/quantize_common.py, tests/unit/torch/recipe/__init__.py
New comprehensive unit tests for load_config/load_recipe and added small safety fix in quantize_common; test header additions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PTQScript as PTQ Script<br/>(hf_ptq.py)
    participant RecipeLoader as Recipe Loader<br/>(loader.py)
    participant ConfigLoader as Config Loader<br/>(opt/config.py)
    participant FileSystem as File System/<br/>Built-in Resources
    participant Quantizer as Quantizer<br/>(mtq / quant utils)

    User->>PTQScript: run with --recipe path/to/recipe
    PTQScript->>RecipeLoader: load_recipe(recipe_path)
    RecipeLoader->>FileSystem: resolve path (builtin or filesystem)
    alt merged YAML file
        RecipeLoader->>FileSystem: read recipe YAML
        RecipeLoader->>ConfigLoader: load_config(model_quant references)
        ConfigLoader->>FileSystem: read referenced YAMLs, resolve __base__ recursively
        ConfigLoader-->>RecipeLoader: merged QuantizeConfig
    else directory-format recipe
        RecipeLoader->>FileSystem: read recipe descriptor and model_quant/kv_quant files
        RecipeLoader->>ConfigLoader: load_config for each component
        ConfigLoader-->>RecipeLoader: merged components
    end
    RecipeLoader-->>PTQScript: ModelOptPTQRecipe (model_quant, kv_quant)
    PTQScript->>PTQScript: build_quant_cfg(recipe.model_quant)
    PTQScript->>PTQScript: update_quant_cfg_with_kv_cache_quant(recipe.kv_quant)
    PTQScript->>Quantizer: quantize(model, quant_cfg)
    Quantizer-->>User: quantized model
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing a recipe library framework and converting existing recipes to YAML-based format as the first part of a multi-part PR series.
Docstring Coverage ✅ Passed Docstring coverage is 88.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request does not introduce any of the six critical security anti-patterns. All YAML loading uses safe_load(), new dependencies have permissive licenses, and security-sensitive parameters are properly exposed as configurable options.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch shengliangx/modeopt-recipe-lib-mvp
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@shengliangxu shengliangxu force-pushed the shengliangx/modeopt-recipe-lib-mvp branch from ee14948 to c2d3678 Compare March 6, 2026 21:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/llm_ptq/example_utils.py (1)

196-241: ⚠️ Potential issue | 🟠 Major

Clone quant_cfg before applying model-specific overrides.

This function mutates the passed-in config in the moe_calib_experts_ratio, Gemma int8_sq, and phi4mm branches; only the AWQ path gets a deep copy today. Reusing the same recipe/config object across calls will leak those edits into later quantization runs.

Proposed fix
 def build_quant_cfg(
     qformat,
     quant_cfg,
     awq_block_size,
     model_type,
     moe_calib_experts_ratio: float | None = None,
 ) -> dict[str, Any]:
+    quant_cfg = copy.deepcopy(quant_cfg)
     if "awq" in str(quant_cfg.get("algorithm")):
-        quant_cfg = copy.deepcopy(quant_cfg)
         weight_quantizer = quant_cfg["quant_cfg"]["*weight_quantizer"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 196 - 241, The function
build_quant_cfg mutates the input quant_cfg in multiple branches
(moe_calib_experts_ratio, Gemma int8_sq, phi4mm), causing callers to observe
leaked changes; fix this by making a deep copy of quant_cfg immediately on entry
(e.g., quant_cfg = copy.deepcopy(quant_cfg)) and then perform all modifications
on that copy (so weight_quantizer, quant_cfg["algorithm"], and
quant_cfg["quant_cfg"] edits do not affect the caller's object). Ensure the
deepcopy happens before any reads/writes that may alter nested structures in
build_quant_cfg.
🟡 Minor comments (11)
modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Filename appears to have unintentional duplication.

The filename w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml contains _nvfp4_mlp_only repeated twice. This looks like a copy-paste error. Consider renaming to something like w4a4_nvfp4_mlp_only.yml for consistency with other config naming conventions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml` around
lines 1 - 2, The file name contains a duplicated token ("_nvfp4_mlp_only") —
rename the config file from w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to
w4a4_nvfp4_mlp_only.yml and update any references to this file (e.g., in tests,
loaders, or CI) so they point to the new filename to keep naming consistent with
other configs.
modelopt_recipes/configs/ptq/kv_fp8_affine.yml-16-22 (1)

16-22: ⚠️ Potential issue | 🟡 Minor

Add missing enable: true property.

The YAML structure for the bias section is correct; however, enable: true should be added as a sibling property to num_bits and bias, matching the structure in related config files (kv_fp8.yml, kv_nvfp4_affine.yml).

Proposed fix
 quant_cfg:
   '*[kv]_bmm_quantizer':
     num_bits: [4, 3]
     bias:
       -2:
       -4:
       type: static
+    enable: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/ptq/kv_fp8_affine.yml` around lines 16 - 22, In the
quant_cfg block for '*[kv]_bmm_quantizer' add the missing sibling property
"enable: true" (next to num_bits and bias) so the entry reads the same structure
as other configs (e.g., kv_fp8.yml and kv_nvfp4_affine.yml); update the YAML
under the '*[kv]_bmm_quantizer' key to include enable: true directly beneath
num_bits and alongside bias to enable the quantizer.
modelopt_recipes/general/README-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor

Fix the typo in the PTQ description.

Traing should be Training.

Suggested fix
-ptq/*: model-agnostic general Post Traing Quantization recipes
+ptq/*: model-agnostic general Post Training Quantization recipes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/general/README` at line 3, Fix the typo in the README line
that currently reads "ptq/*: model-agnostic general Post Traing Quantization
recipes" by changing "Traing" to "Training" so it reads "ptq/*: model-agnostic
general Post Training Quantization recipes"; locate the string containing
"ptq/*" or "Post Traing Quantization" and update it accordingly.
modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml-20-22 (1)

20-22: ⚠️ Potential issue | 🟡 Minor

Use explicit axis: null or omit the key entirely.

Line 22 has a blank axis: that parses to YAML null. While this works correctly (functionally equivalent to axis: None for per-tensor quantization), it's ambiguous and inconsistent with explicit patterns in Python tests and code. For clarity and consistency across the codebase, either write axis: null explicitly or remove the key entirely (the schema defaults axis to None).

This pattern appears in other configs as well (w8a8_fp8_fp8.yml, w4a8_mxfp4_fp8.yml).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml` around lines 20
- 22, The YAML fragment under the "*input_quantizer" mapping currently contains
a blank "axis:" key which parses to null but is ambiguous; update the mapping by
either setting "axis: null" explicitly or removing the "axis" key entirely so
the schema default (None) is used—apply the same change wherever you see the
blank "axis:" pattern (e.g., in the w8a8_fp8_fp8.yml and w4a8_mxfp4_fp8.yml
configs) to keep the intent consistent with Python tests and code.
modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml-18-18 (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Typo in description: missing space.

"per tokenactivation" should be "per token activation".

📝 Proposed fix
-description: "FP8 per channel weight, FP8 per tokenactivation, FP8 KV Cache, max calibration."
+description: "FP8 per channel weight, FP8 per token activation, FP8 KV Cache, max calibration."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml` at
line 18, Description string contains a typo "per tokenactivation"; update the
description value in recipe.yml (the description field) to read "FP8 per channel
weight, FP8 per token activation, FP8 KV Cache, max calibration." — locate the
description key in
modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml and
insert the missing space between "token" and "activation".
modelopt/torch/opt/config.py-426-430 (1)

426-430: ⚠️ Potential issue | 🟡 Minor

Validate the YAML root before assuming a mapping.

Blank/comment-only files or a list/scalar document make Line 430 fail with AttributeError because yaml.safe_load() can return non-dicts. Raise a ValueError here so recipe load failures stay actionable.

Proposed fix
     content = config_path.read_text(encoding="utf-8")
 
     config_data = yaml.safe_load(content)
+    if not isinstance(config_data, dict):
+        raise ValueError(
+            f"Config {config_path} must contain a top-level mapping, "
+            f"got {type(config_data).__name__}"
+        )
 
     bases = [load_config(base_cfg) for base_cfg in config_data.pop("__base__", [])]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/config.py` around lines 426 - 430, The code reads YAML
into config_data and then assumes it's a mapping when building bases in
load_config; validate that config_data is a dict after yaml.safe_load(content)
and raise a clear ValueError if not (e.g., for None, list, or scalar documents)
so recipe load failures remain actionable; update the logic in the function
around config_data and the bases = [load_config(...)] line to check
isinstance(config_data, dict) and raise with a message referencing config_path
and the unexpected type.
modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml-18-18 (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Description says this recipe quantizes activations, but the recipe is weight-only.

Line 18 conflicts with the weight_only recipe name and the companion model_quant.yml base (configs/ptq/w8_fp8_2d_blockwise). That makes the catalog metadata misleading when users pick recipes.

📝 Suggested fix
-description: "FP8 2D blockwise weights, FP8 activation, FP8 KV Cache, max calibration."
+description: "FP8 2D blockwise weights, no activation quantization, FP8 KV Cache, max calibration."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml`
at line 18, The description string currently reads "FP8 2D blockwise weights,
FP8 activation, FP8 KV Cache, max calibration." but the recipe is weight-only
(see weight_only and base model_quant.yml reference to
configs/ptq/w8_fp8_2d_blockwise); update the description to accurately reflect
that only weights are quantized (e.g., "FP8 2D blockwise weight-only
quantization, FP8 KV Cache, max calibration") and remove or reword any mention
of FP8 activation so catalog metadata matches the recipe name and base config.
modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml-16-19 (1)

16-19: ⚠️ Potential issue | 🟡 Minor

Rename config file to remove duplicated suffix.

The referenced config file w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml has a clear naming error: the suffix _nvfp4_mlp_only is duplicated. The file should be renamed to w4a4_nvfp4_mlp_only.yml for clarity. After renaming, update this reference from configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only to configs/ptq/w4a4_nvfp4_mlp_only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml` around
lines 16 - 19, Update the duplicated config reference: replace the base entry
"configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only" with the corrected name
"configs/ptq/w4a4_nvfp4_mlp_only" in the __base__ list inside model_quant.yml;
also ensure the actual file on disk is renamed from
w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to w4a4_nvfp4_mlp_only.yml so the new
reference resolves correctly.
modelopt/torch/recipe/config.py-35-38 (1)

35-38: ⚠️ Potential issue | 🟡 Minor

Replace the copied docstrings in these public models.

Both docstrings describe unrelated layer-rule behavior instead of recipe configuration, so they will mislead anyone relying on generated docs or IDE help.

✏️ Suggested cleanup
 class ModelOptRecipeBase(ModeloptBaseConfig):
-    """Base configuration class for model optimization recipes.
-
-    If a layer name matches ``"*output_layer*"``, the attributes will be replaced with ``{"enable": False}``.
-    """
+    """Base configuration class for model optimization recipes."""
@@
 class ModelOptPTQRecipe(ModelOptRecipeBase):
-    """Our config class for PTQ recipes.
-
-    Rules are what governs the configuration for modifying dynamic module classes.
-    """
+    """Configuration model for PTQ recipes."""

Also applies to: 66-69

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/recipe/config.py` around lines 35 - 38, The docstrings in the
public configuration classes in modelopt/torch/recipe/config.py currently
contain incorrect, copied text about layer-rule behavior; update the docstrings
to accurately describe each class's purpose and public API (e.g., the base
configuration class for model optimization recipes) and remove the unrelated
sentence about matching "*output_layer*" and replacing attributes; ensure both
occurrences (the base config class and the other public config class in this
file) describe their configuration fields, defaults, and intended usage for
generated docs and IDE help.
modelopt/torch/quantization/config.py-845-853 (1)

845-853: ⚠️ Potential issue | 🟡 Minor

Replace assert with ValueError in validator.

Using assert in a field validator is fragile because assertions can be disabled with python -O. Pydantic validators should raise ValueError or TypeError for invalid input.

🛡️ Proposed fix
     `@field_validator`("num_bits", mode="before")
     `@classmethod`
     def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]:
         """Convert num_bits to tuple if list."""
         if isinstance(num_bits, list):
-            assert len(num_bits) == 2
+            if len(num_bits) != 2:
+                raise ValueError("num_bits list must have exactly 2 elements")
             return cast("tuple[int, int]", tuple(num_bits))
 
         return num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 845 - 853, The
tuple_num_bits field validator currently uses assert to check list length, which
can be disabled; update the validator (tuple_num_bits in
modelopt/torch/quantization/config.py) to raise a ValueError (or TypeError) with
a clear message when num_bits is a list but len(num_bits) != 2, then return the
cast tuple as before; leave other behavior unchanged.
modelopt/torch/recipe/loader.py-76-79 (1)

76-79: ⚠️ Potential issue | 🟡 Minor

Replace assert with explicit ValueError for validation.

Using assert for input validation is fragile because assertions can be disabled when Python runs with the -O (optimize) flag. This would silently skip validation and could lead to cryptic errors downstream.

🛡️ Proposed fix
-    assert isinstance(recipe_descriptor, dict), "Recipe descriptor should be a dictionary."
-    assert "recipe_type" in recipe_descriptor, (
-        "Recipe descriptor should contain 'recipe_type' field."
-    )
+    if not isinstance(recipe_descriptor, dict):
+        raise ValueError("Recipe descriptor should be a dictionary.")
+    if "recipe_type" not in recipe_descriptor:
+        raise ValueError("Recipe descriptor should contain 'recipe_type' field.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/recipe/loader.py` around lines 76 - 79, Replace the fragile
assert-based checks for the incoming recipe_descriptor with explicit validation
that raises ValueError: check that recipe_descriptor is a dict and if not raise
ValueError("Recipe descriptor should be a dictionary."); then check that
"recipe_type" is present and if not raise ValueError("Recipe descriptor should
contain 'recipe_type' field."). Update the validation in loader.py where
recipe_descriptor is handled so the function (the block containing the current
assert lines) raises these ValueErrors instead of using assert.
🧹 Nitpick comments (5)
modelopt_recipes/configs/ptq/calib_algo_none.yml (1)

16-16: Make the null sentinel explicit.

algorithm: parses as null, but spelling that out makes the “no calibration” intent clearer in these composed configs.

Suggested cleanup
-algorithm:
+algorithm: null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/ptq/calib_algo_none.yml` at line 16, The YAML key
"algorithm" currently parses to null implicitly; make the "no calibration"
intent explicit by setting the algorithm field to an explicit null sentinel
(e.g., use the YAML null token) so readers and config composition logic see the
intended "no calibration" value for the algorithm key.
modelopt_recipes/configs/README (1)

1-3: Clarify the path wording in this README.

Line 3 reads awkwardly and leaves the reference root implicit. I’d rewrite this to something like: Recipes can reference shared configs via paths under configs/....

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/README` around lines 1 - 3, Update the README
sentence "The recipes can refer to the configs using path of: configs/..." to
clearer wording; replace it with "Recipes can reference shared configs via paths
under configs/..." so the reference root is explicit and reads naturally (edit
the README line containing that original phrase).
modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml (1)

18-18: Tighten the recipe description wording.

The current phrase “no quantization activation” is hard to parse. Something like NVFP4 MLP weight-only, no activation quantization, FP8 KV cache, max calibration would read more cleanly and better match the folder name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml` at line
18, Update the description string in recipe.yml to a clearer phrasing that
matches the folder name and intent: replace the current description value with
something like "NVFP4 MLP weight-only, no activation quantization, FP8 KV cache,
max calibration" so it reads cleanly and explicitly references weight-only MLP,
no activation quantization, FP8 KV cache, and max calibration.
modelopt_recipes/configs/ptq/kv_fp8.yml (1)

16-19: Document what num_bits: [4, 3] means here.

This file is a shared base used by multiple recipes, but the FP8 encoding behind [4, 3] is not obvious at a glance. A short inline comment would make future edits much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/configs/ptq/kv_fp8.yml` around lines 16 - 19, Add a short
inline comment next to the num_bits entry in the '*[kv]_bmm_quantizer' block
explaining that num_bits: [4, 3] specifies the FP8 format (4 exponent bits and 3
mantissa/significand bits, i.e., E4M3) and note any relevant bias/representation
detail or mapping used by your quantizer implementation so future editors
understand the encoding expected by kv_fp8.yml and the quantizer code paths
(e.g., references to E4M3 vs E3M4 conventions).
modelopt/torch/quantization/config.py (1)

149-168: Consider the implications of module-level assertions.

The assertions at module import time (line 166-168 and throughout the file) tightly couple Python code to YAML config files. This has two concerns:

  1. Import failure risk: If any referenced YAML file is missing or doesn't match, the entire modelopt.torch.quantization.config module fails to import, breaking all downstream code.

  2. Assertion bypass: With python -O, assertions are disabled, silently skipping these consistency checks.

If these checks are critical, consider:

  • Moving them to a dedicated test file
  • Using explicit if ... raise RuntimeError(...) instead of assert
  • Adding a lazy validation approach triggered on first use
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 149 - 168, The module
currently performs a module-level assert comparing
OmegaConf.to_container(OmegaConf.create({"quant_cfg":
_default_disabled_quantizer_cfg}), resolve=True) to
load_config("configs/ptq/base"), which can break imports or be bypassed under
python -O; replace that assert with an explicit runtime check that raises a
clear RuntimeError if the configs mismatch (e.g., compute the two values using
OmegaConf.to_container and load_config and if they differ raise RuntimeError
with a descriptive message), or alternatively move this validation into a
dedicated unit test or a lazily-invoked validator function called on first use;
ensure you update the code around _default_disabled_quantizer_cfg,
OmegaConf.to_container/OmegaConf.create, and load_config(...) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/multinode_ptq.py`:
- Around line 330-347: QUANT_CFG_CHOICES[args.qformat] is a shared template and
must be cloned before mutation to avoid leaking changes; import the stdlib copy
module and call copy.deepcopy on QUANT_CFG_CHOICES[args.qformat] into quant_cfg
before passing it to build_quant_cfg, and likewise ensure when calling
mtq.update_quant_cfg_with_kv_cache_quant you pass a cloned quant_cfg (or
deepcopy the KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"] if that is
mutated) so neither build_quant_cfg nor update_quant_cfg_with_kv_cache_quant
mutates module-level objects.

In `@modelopt_recipes/configs/ptq/kv_nvfp4_affine.yml`:
- Around line 19-26: The YAML places "type: dynamic" and "scale_bits: [4, 3]"
under block_sizes and "type: static" under bias, but these should be sibling
properties of the root quantization config rather than nested; move "type:
dynamic" and "scale_bits: [4, 3]" out from inside block_sizes and place them at
the same level as block_sizes (e.g., alongside block_sizes: and bias:), and
likewise move "type: static" out from inside bias to be a sibling property (or
rename it if the schema expects a different bias quantization key); update the
entries for block_sizes, bias, type, and scale_bits accordingly so they are
top-level siblings in that config block.

In `@modelopt_recipes/configs/ptq/kv_nvfp4_rotate.yml`:
- Around line 22-27: The YAML nests type: dynamic and scale_bits: [4, 3] under
block_sizes within k_bmm_quantizer and v_bmm_quantizer; move those keys out of
the block_sizes mapping so block_sizes only contains size mappings (e.g., -1:
16) and place type and scale_bits as sibling keys of block_sizes under
k_bmm_quantizer and v_bmm_quantizer respectively, preserving enable and rotate
keys.

In `@modelopt_recipes/configs/ptq/w4_int4.yml`:
- Around line 19-22: The YAML has `type: static` incorrectly nested under
`block_sizes`; move the `type` key so it is a sibling of `block_sizes` (i.e.,
same indentation level) within the *weight_quantizer mapping and keep `enable:
true` as a sibling as well so the mapping contains `block_sizes`, `type`, and
`enable` at the same level consistent with other configs (e.g.,
w8a8_mxfp8_mxfp8.yml).

In `@modelopt_recipes/configs/ptq/w4_nvfp4_mlp_only.yml`:
- Around line 19-22: The YAML currently nests type and scale_bits inside the
block_sizes map; unindent the keys so that type and scale_bits are top-level
fields alongside block_sizes (i.e., align their indentation with block_sizes
rather than being children of it) to restore the expected quantizer schema; do
the same fix for the other identical block later in the file (the second
occurrence of block_sizes/type/scale_bits).

In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_nvfp4.yml`:
- Around line 18-23: The YAML has misplaced keys: move the keys "type: dynamic"
and "scale_bits: [4, 3]" out from under the "block_sizes" mapping so they are
siblings of "block_sizes" (same indentation level as "block_sizes", "num_bits",
and "enable"); do the same correction for the second quantizer block (the other
occurrence containing "block_sizes" and "num_bits") so both quantizer blocks
have "type" and "scale_bits" at the correct top-level within each block.

In `@modelopt_recipes/configs/ptq/w4a8_int4_blockwise_fp8_fp8.yml`:
- Around line 17-25: The YAML has `type: static` incorrectly nested under
`block_sizes` within the `*weight_quantizer` mapping; move `type: static` out of
the `block_sizes` mapping so it is a sibling property of `num_bits` and
`block_sizes` under `*weight_quantizer` (i.e., align `type` with `num_bits` and
`enable`), and verify both quantizer entries (the single `num_bits: 4` block
with `block_sizes` and the `num_bits: [4, 3]` block) follow the same correct
indentation/structure so the parser treats `type`, `block_sizes`, `num_bits`,
and `enable` as peers rather than nesting `type` inside `block_sizes`.

In `@modelopt_recipes/configs/ptq/w4a8_mxfp4_fp8.yml`:
- Around line 24-26: The YAML has an incomplete key under the '*input_quantizer'
mapping: the 'axis:' key is empty which causes parsing to produce null; update
the mapping anchored by '*input_quantizer' to provide a concrete value (for
example "axis: 0" or "axis: -1" or the intended integer/list), or remove the
'axis:' line entirely if axis is not required—ensure the chosen value matches
other quantizer definitions (search for '*input_quantizer' and any uses of
'axis' in related quantizer configs to pick the correct value).
- Around line 17-23: The YAML has `type: dynamic` and `scale_bits: [8, 0]`
incorrectly nested under the `block_sizes` mapping for the '*weight_quantizer'
section; update the '*weight_quantizer' config so that `block_sizes` only
contains the size mapping (e.g., `-1: 32`) and move `type: dynamic` and
`scale_bits: [8, 0]` out to be sibling keys alongside `num_bits`, `block_sizes`,
and `enable` (i.e., ensure `num_bits`, `block_sizes`, `type`, `scale_bits`, and
`enable` are all at the same indentation level under '*weight_quantizer').

In `@modelopt_recipes/configs/ptq/w6a6_mxfp6_mxfp6.yml`:
- Around line 17-30: The YAML for both "*weight_quantizer" and
"*input_quantizer" has "type: dynamic" and "scale_bits: [8, 0]" mistakenly
nested under "block_sizes"; move "type" and "scale_bits" out so they are sibling
keys of "block_sizes" (i.e., ensure "block_sizes" only contains the mapping for
-1: 32 and "type" and "scale_bits" sit at the same level as "num_bits",
"block_sizes", and "enable"); update both quantizer blocks ("*weight_quantizer"
and "*input_quantizer") accordingly.

In `@modelopt_recipes/configs/ptq/w8a8_mxfp8_mxfp8.yml`:
- Around line 19-23: The YAML nests type: dynamic and scale_bits: [8, 0] under
the block_sizes mapping for both *weight_quantizer and *input_quantizer, which
makes them keys inside block_sizes; move the type and scale_bits entries out one
indentation level so they are sibling properties of block_sizes (i.e., keep
block_sizes: { -1: 32 } and add type: dynamic and scale_bits: [8, 0] alongside
block_sizes) for both *weight_quantizer and *input_quantizer.

In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml`:
- Line 1: The file header is malformed because the `__base__` prefix was
accidentally prepended to the SPDX license comment; remove the `__base__` token
so the first line is a valid YAML comment (e.g., start with `#
SPDX-FileCopyrightText: ...`) to restore proper YAML syntax in kv_quant.yml and
ensure any base inclusion is handled via the intended YAML inheritance mechanism
rather than inline prefix.

In `@modelopt/torch/opt/config.py`:
- Around line 401-430: The current loader picks a config_path from
paths_to_check but then resolves entries in config_data["__base__"] by calling
load_config(base_cfg) directly, which lets bare base names resolve against the
caller CWD instead of the originating recipe library; update the base resolution
so that for each base_cfg you first resolve it relative to the chosen
config_path (or relative to BUILTIN_RECIPES_LIB when the matched path came from
that lib): when iterating bases = config_data.pop("__base__", []), if base_cfg
is a relative string/path (not absolute) join it with config_path.parent (or the
library path used to find config_path) and normalize suffixes (add .yml/.yaml
probing like the original search) before calling load_config on the fully
resolved Path; ensure you reference config_path, BUILTIN_RECIPES_LIB and
load_config in the change so built-in bases remain anchored to their recipe
library.

In `@modelopt/torch/recipe/config.py`:
- Around line 78-83: Replace the untyped dict kv_quant with a dedicated Pydantic
config model (e.g., class KVQuantConfig(BaseModel) or `@dataclass`) that mirrors
the fields/validation used by model_quant, then change the kv_quant field to use
that type (kv_quant: KVQuantConfig = ModeloptField(...)) and set a proper
default via default or default_factory (not a bare {}), keep
validate_default=True, and add the necessary import for the new KVQuantConfig so
schema validation and IDE hints are preserved for the KV cache quant settings.

In `@modelopt/torch/recipe/loader.py`:
- Around line 117-122: The code attempts to instantiate QuantizeQuantCfgType in
ModelOptPTQRecipe construction, but QuantizeQuantCfgType is a type alias (not
callable) so change the call site in the loader where ModelOptPTQRecipe(...) is
returned: stop using kv_quant=QuantizeQuantCfgType(**kv_quant_config) and
instead pass the raw dict (kv_quant=kv_quant_config) since
ModelOptPTQRecipe.kv_quant expects dict[str, Any]; keep
model_quant=QuantizeConfig(**model_quant_config) as-is.

In `@pyproject.toml`:
- Around line 121-122: The package-data section is missing an entry for the
modelopt_recipes package so its YAML recipe files aren’t included in built
distributions; update the [tool.setuptools.package-data] table to add a key for
modelopt_recipes that includes the recipe file patterns (e.g., "**/*.yml",
"**/*.yaml") so setuptools includes those files in wheels/sdists; ensure the new
entry mirrors the existing modelopt patterns but targets the modelopt_recipes
package name so recipe loading works after installation.

---

Outside diff comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 196-241: The function build_quant_cfg mutates the input quant_cfg
in multiple branches (moe_calib_experts_ratio, Gemma int8_sq, phi4mm), causing
callers to observe leaked changes; fix this by making a deep copy of quant_cfg
immediately on entry (e.g., quant_cfg = copy.deepcopy(quant_cfg)) and then
perform all modifications on that copy (so weight_quantizer,
quant_cfg["algorithm"], and quant_cfg["quant_cfg"] edits do not affect the
caller's object). Ensure the deepcopy happens before any reads/writes that may
alter nested structures in build_quant_cfg.

---

Minor comments:
In `@modelopt_recipes/configs/ptq/kv_fp8_affine.yml`:
- Around line 16-22: In the quant_cfg block for '*[kv]_bmm_quantizer' add the
missing sibling property "enable: true" (next to num_bits and bias) so the entry
reads the same structure as other configs (e.g., kv_fp8.yml and
kv_nvfp4_affine.yml); update the YAML under the '*[kv]_bmm_quantizer' key to
include enable: true directly beneath num_bits and alongside bias to enable the
quantizer.

In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml`:
- Around line 1-2: The file name contains a duplicated token ("_nvfp4_mlp_only")
— rename the config file from w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to
w4a4_nvfp4_mlp_only.yml and update any references to this file (e.g., in tests,
loaders, or CI) so they point to the new filename to keep naming consistent with
other configs.

In `@modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml`:
- Around line 20-22: The YAML fragment under the "*input_quantizer" mapping
currently contains a blank "axis:" key which parses to null but is ambiguous;
update the mapping by either setting "axis: null" explicitly or removing the
"axis" key entirely so the schema default (None) is used—apply the same change
wherever you see the blank "axis:" pattern (e.g., in the w8a8_fp8_fp8.yml and
w4a8_mxfp4_fp8.yml configs) to keep the intent consistent with Python tests and
code.

In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml`:
- Line 18: The description string currently reads "FP8 2D blockwise weights, FP8
activation, FP8 KV Cache, max calibration." but the recipe is weight-only (see
weight_only and base model_quant.yml reference to
configs/ptq/w8_fp8_2d_blockwise); update the description to accurately reflect
that only weights are quantized (e.g., "FP8 2D blockwise weight-only
quantization, FP8 KV Cache, max calibration") and remove or reword any mention
of FP8 activation so catalog metadata matches the recipe name and base config.

In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml`:
- Line 18: Description string contains a typo "per tokenactivation"; update the
description value in recipe.yml (the description field) to read "FP8 per channel
weight, FP8 per token activation, FP8 KV Cache, max calibration." — locate the
description key in
modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml and
insert the missing space between "token" and "activation".

In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml`:
- Around line 16-19: Update the duplicated config reference: replace the base
entry "configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only" with the corrected name
"configs/ptq/w4a4_nvfp4_mlp_only" in the __base__ list inside model_quant.yml;
also ensure the actual file on disk is renamed from
w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to w4a4_nvfp4_mlp_only.yml so the new
reference resolves correctly.

In `@modelopt_recipes/general/README`:
- Line 3: Fix the typo in the README line that currently reads "ptq/*:
model-agnostic general Post Traing Quantization recipes" by changing "Traing" to
"Training" so it reads "ptq/*: model-agnostic general Post Training Quantization
recipes"; locate the string containing "ptq/*" or "Post Traing Quantization" and
update it accordingly.

In `@modelopt/torch/opt/config.py`:
- Around line 426-430: The code reads YAML into config_data and then assumes
it's a mapping when building bases in load_config; validate that config_data is
a dict after yaml.safe_load(content) and raise a clear ValueError if not (e.g.,
for None, list, or scalar documents) so recipe load failures remain actionable;
update the logic in the function around config_data and the bases =
[load_config(...)] line to check isinstance(config_data, dict) and raise with a
message referencing config_path and the unexpected type.

In `@modelopt/torch/quantization/config.py`:
- Around line 845-853: The tuple_num_bits field validator currently uses assert
to check list length, which can be disabled; update the validator
(tuple_num_bits in modelopt/torch/quantization/config.py) to raise a ValueError
(or TypeError) with a clear message when num_bits is a list but len(num_bits) !=
2, then return the cast tuple as before; leave other behavior unchanged.

In `@modelopt/torch/recipe/config.py`:
- Around line 35-38: The docstrings in the public configuration classes in
modelopt/torch/recipe/config.py currently contain incorrect, copied text about
layer-rule behavior; update the docstrings to accurately describe each class's
purpose and public API (e.g., the base configuration class for model
optimization recipes) and remove the unrelated sentence about matching
"*output_layer*" and replacing attributes; ensure both occurrences (the base
config class and the other public config class in this file) describe their
configuration fields, defaults, and intended usage for generated docs and IDE
help.

In `@modelopt/torch/recipe/loader.py`:
- Around line 76-79: Replace the fragile assert-based checks for the incoming
recipe_descriptor with explicit validation that raises ValueError: check that
recipe_descriptor is a dict and if not raise ValueError("Recipe descriptor
should be a dictionary."); then check that "recipe_type" is present and if not
raise ValueError("Recipe descriptor should contain 'recipe_type' field.").
Update the validation in loader.py where recipe_descriptor is handled so the
function (the block containing the current assert lines) raises these
ValueErrors instead of using assert.

---

Nitpick comments:
In `@modelopt_recipes/configs/ptq/calib_algo_none.yml`:
- Line 16: The YAML key "algorithm" currently parses to null implicitly; make
the "no calibration" intent explicit by setting the algorithm field to an
explicit null sentinel (e.g., use the YAML null token) so readers and config
composition logic see the intended "no calibration" value for the algorithm key.

In `@modelopt_recipes/configs/ptq/kv_fp8.yml`:
- Around line 16-19: Add a short inline comment next to the num_bits entry in
the '*[kv]_bmm_quantizer' block explaining that num_bits: [4, 3] specifies the
FP8 format (4 exponent bits and 3 mantissa/significand bits, i.e., E4M3) and
note any relevant bias/representation detail or mapping used by your quantizer
implementation so future editors understand the encoding expected by kv_fp8.yml
and the quantizer code paths (e.g., references to E4M3 vs E3M4 conventions).

In `@modelopt_recipes/configs/README`:
- Around line 1-3: Update the README sentence "The recipes can refer to the
configs using path of: configs/..." to clearer wording; replace it with "Recipes
can reference shared configs via paths under configs/..." so the reference root
is explicit and reads naturally (edit the README line containing that original
phrase).

In `@modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml`:
- Line 18: Update the description string in recipe.yml to a clearer phrasing
that matches the folder name and intent: replace the current description value
with something like "NVFP4 MLP weight-only, no activation quantization, FP8 KV
cache, max calibration" so it reads cleanly and explicitly references
weight-only MLP, no activation quantization, FP8 KV cache, and max calibration.

In `@modelopt/torch/quantization/config.py`:
- Around line 149-168: The module currently performs a module-level assert
comparing OmegaConf.to_container(OmegaConf.create({"quant_cfg":
_default_disabled_quantizer_cfg}), resolve=True) to
load_config("configs/ptq/base"), which can break imports or be bypassed under
python -O; replace that assert with an explicit runtime check that raises a
clear RuntimeError if the configs mismatch (e.g., compute the two values using
OmegaConf.to_container and load_config and if they differ raise RuntimeError
with a descriptive message), or alternatively move this validation into a
dedicated unit test or a lazily-invoked validator function called on first use;
ensure you update the code around _default_disabled_quantizer_cfg,
OmegaConf.to_container/OmegaConf.create, and load_config(...) accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 215f48fb-1b5f-4f50-a528-f88414905cc3

📥 Commits

Reviewing files that changed from the base of the PR and between be6dfad and c2d3678.

📒 Files selected for processing (118)
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • examples/llm_ptq/multinode_ptq.py
  • modelopt/torch/opt/config.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/recipe/__init__.py
  • modelopt/torch/recipe/config.py
  • modelopt/torch/recipe/loader.py
  • modelopt_recipes/__init__.py
  • modelopt_recipes/configs/README
  • modelopt_recipes/configs/ptq/base.yml
  • modelopt_recipes/configs/ptq/calib_algo_awq_clip.yml
  • modelopt_recipes/configs/ptq/calib_algo_awq_full.yml
  • modelopt_recipes/configs/ptq/calib_algo_awq_lite.yml
  • modelopt_recipes/configs/ptq/calib_algo_max.yml
  • modelopt_recipes/configs/ptq/calib_algo_none.yml
  • modelopt_recipes/configs/ptq/calib_algo_smoothquant.yml
  • modelopt_recipes/configs/ptq/calib_algo_svdquant.yml
  • modelopt_recipes/configs/ptq/kv_fp8.yml
  • modelopt_recipes/configs/ptq/kv_fp8_affine.yml
  • modelopt_recipes/configs/ptq/kv_nvfp4.yml
  • modelopt_recipes/configs/ptq/kv_nvfp4_affine.yml
  • modelopt_recipes/configs/ptq/kv_nvfp4_rotate.yml
  • modelopt_recipes/configs/ptq/mha_fp8.yml
  • modelopt_recipes/configs/ptq/w4_int4.yml
  • modelopt_recipes/configs/ptq/w4_int4_blockwise.yml
  • modelopt_recipes/configs/ptq/w4_mxfp4_mlp_only.yml
  • modelopt_recipes/configs/ptq/w4_nvfp4_mlp_only.yml
  • modelopt_recipes/configs/ptq/w4a4_mxfp4_mxfp4.yml
  • modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml
  • modelopt_recipes/configs/ptq/w4a4_nvfp4_nvfp4.yml
  • modelopt_recipes/configs/ptq/w4a8_int4_blockwise_fp8_fp8.yml
  • modelopt_recipes/configs/ptq/w4a8_mxfp4_fp8.yml
  • modelopt_recipes/configs/ptq/w4a8_nvfp4_fp8.yml
  • modelopt_recipes/configs/ptq/w6a6_mxfp6_mxfp6.yml
  • modelopt_recipes/configs/ptq/w8_fp8_2d_blockwise.yml
  • modelopt_recipes/configs/ptq/w8_int8.yml
  • modelopt_recipes/configs/ptq/w8a8_fp8_fp8.yml
  • modelopt_recipes/configs/ptq/w8a8_fp8_per_channel_fp8_per_token.yml
  • modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml
  • modelopt_recipes/configs/ptq/w8a8_mxfp8_mxfp8.yml
  • modelopt_recipes/configs/ptq/w8a8_mxint8_mxint8.yml
  • modelopt_recipes/general/README
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/int8_weight_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/int8_weight_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/int8_weight_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mxfp8_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mxfp8_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mxfp8_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/recipe.yml
  • pyproject.toml
  • tests/_test_utils/torch/quantization/quantize_common.py

@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 90.38462% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.31%. Comparing base (bc87981) to head (a6529be).

Files with missing lines Patch % Lines
modelopt/recipe/loader.py 80.51% 15 Missing ⚠️
modelopt/recipe/_config_loader.py 85.13% 11 Missing ⚠️
modelopt/recipe/config.py 90.47% 2 Missing ⚠️
modelopt/torch/quantization/utils.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1000      +/-   ##
==========================================
+ Coverage   70.11%   70.31%   +0.19%     
==========================================
  Files         221      225       +4     
  Lines       25459    25727     +268     
==========================================
+ Hits        17851    18090     +239     
- Misses       7608     7637      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 801-809: The tuple_num_bits validator currently uses an assert to
check list length which can be disabled and is inconsistent with other
validators; update tuple_num_bits (the `@field_validator` for "num_bits") to raise
a ValueError when a provided list does not have length 2, e.g. replace the
assert len(num_bits) == 2 with an explicit if check that raises
ValueError("num_bits list must have length 2") so invalid inputs are rejected
consistently.
- Around line 972-980: The return annotation for the validator
tuple_block_sizes_scale_bits is wrong: it declares -> int | tuple[int, int] but
the validator receives and returns the block_sizes dict (possibly mutated).
Update the type annotation to reflect that it returns a dict (e.g., Mapping[str,
Any] or dict[str, Any]) and keep the logic that converts v["scale_bits"] from
list to tuple when present; ensure the decorator `@field_validator`("block_sizes",
mode="before") and method name tuple_block_sizes_scale_bits remain unchanged so
the validator still runs for the block_sizes field.
- Around line 148-152: The _default_disabled_quantizer_cfg uses fully-qualified
keys ("torch.nn.BatchNorm1d", etc.) that don't match the registry keys; update
the dict keys in _default_disabled_quantizer_cfg to the registered identifiers
("nn.BatchNorm1d", "nn.BatchNorm2d", "nn.BatchNorm3d", "nn.LeakyReLU") so
set_quantizer_by_cfg() can correctly find entries in QuantModuleRegistry (see
quant_batchnorm.py and quant_activations.py) and properly disable BatchNorm and
LeakyReLU quantizers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 39b7a440-e093-46d3-bba8-f01e98f0aa9c

📥 Commits

Reviewing files that changed from the base of the PR and between c2d3678 and a720436.

📒 Files selected for processing (1)
  • modelopt/torch/quantization/config.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)

852-861: ⚠️ Potential issue | 🟡 Minor

Use ValueError instead of assert for input validation.

Line 857 uses assert len(num_bits) == 2 which is inconsistent with other validators in this class (e.g., validate_num_bits, validate_bias) that raise ValueError. Additionally, assertions can be disabled with python -O, allowing invalid inputs to pass silently.

📝 Proposed fix
     `@field_validator`("num_bits", mode="before")
     `@classmethod`
     def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]:
         """Convert num_bits to tuple if list."""
         if isinstance(num_bits, list):
-            assert len(num_bits) == 2
+            if len(num_bits) != 2:
+                raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}")
             return cast("tuple[int, int]", tuple(num_bits))
 
         return num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 852 - 861, The
tuple_num_bits field validator uses an assert to check list length; replace the
assert with raising a ValueError when len(num_bits) != 2 so invalid inputs
cannot slip through when Python assertions are disabled. In the tuple_num_bits
method (decorated with `@field_validator`("num_bits", mode="before")), validate
that a list has exactly two elements and raise ValueError("num_bits list must
have length 2") (or similar) before casting and returning the tuple; keep the
existing cast("tuple[int, int]", tuple(num_bits)) and return semantics
unchanged.

1023-1031: ⚠️ Potential issue | 🟡 Minor

Incorrect return type annotation.

The return type -> int | tuple[int, int] is incorrect. This validator operates on block_sizes which is a dict, and returns the (potentially modified) dict v.

📝 Proposed fix
     `@field_validator`("block_sizes", mode="before")
     `@classmethod`
-    def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]:
+    def tuple_block_sizes_scale_bits(
+        cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None
+    ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None:
         """Convert block_sizes.scale_bits to tuple if list."""
         if v and v.get("scale_bits"):
             scale_bits = v.get("scale_bits")
             if isinstance(scale_bits, list):
                 v["scale_bits"] = tuple(scale_bits)
         return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1023 - 1031, The
validator tuple_block_sizes_scale_bits is annotated to return int | tuple[int,
int] but it actually accepts and returns the block_sizes dict (possibly
mutated); update its return type annotation to reflect a dict (e.g., dict[str,
Any] or Mapping[str, Any]) so the `@field_validator` signature matches the actual
behavior and avoid type-checker complaints.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)

1033-1050: Existing validators use assertions inconsistently.

For future consideration: The validate_block_sizes validator uses assert statements (lines 1039, 1041, 1046, 1048) instead of raising ValueError, which is inconsistent with the pattern used in validate_bias and other validators. This is existing code and not blocking, but worth noting for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1033 - 1050, The
validator function validate_block_sizes uses Python assert statements for input
validation which should be replaced with explicit exceptions for consistent
runtime validation: locate validate_block_sizes (decorated with
`@field_validator`) and replace each assert with raising ValueError (or TypeError
where appropriate) including the same descriptive messages — e.g., check
info.data["axis"] is None and raise ValueError("axis must be None when
block_sizes is not None."), when dynamic type is detected use
cls._get_block_quant_axes_and_sizes(v) and raise ValueError("Dynamic block
quantization only supports quantization last axis.") if the length check fails,
and for the key/value checks raise ValueError describing invalid keys or types
(e.g., "invalid block_sizes key" or "block_sizes values must be int or None").
Ensure the validator still returns v on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml`:
- Around line 18-19: Update the YAML description to accurately reflect the
recipe variant: replace the current "Mamba MoE FP8 Conservative with FP8 KV
cache." text in the description field with a phrase matching the file name and
variant (e.g., "Mamba MoE NVFP4 Aggressive with FP8 KV cache.") so the metadata
aligns with mamba_moe_nvfp4_aggressive-fp8_kv; ensure the description string
mentions NVFP4 and Aggressive to avoid misleading catalog entries.

In `@modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.yml`:
- Line 1: Remove the stray "__base__" token that was prepended to the license
header in kv_quant.yml so the SPDX comment is a proper YAML comment (i.e.,
restore the header to start with the comment text only), leaving the existing
"__base__:" mapping later in the file unchanged; specifically, delete the
leading "__base__" at the top of the file so the license header is not treated
as a YAML scalar and does not conflict with the "__base__:" entry.

In `@modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml`:
- Around line 16-19: The __base__ list in model_quant.yml is missing the KV
preset; update the __base__ array to include the configs/ptq/kv_fp8.yml entry so
the FP8 KV cache quantization is actually enabled (add "configs/ptq/kv_fp8"
alongside the existing entries). Ensure the same change is applied to the
sibling recipe nvfp4_mlp_only-fp8_kv so both recipes include the kv_fp8 preset.

In
`@modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml`:
- Around line 16-18: The description string in recipe.yml claims "max
calibration" but the PTQ config pointed to by model_quant.yml uses the MSE
calibrator at configs/ptq/calib_algo_mse; update the description field to
accurately state MSE calibration (e.g., replace "max calibration" with "MSE
calibration" or similar) so the recipe description matches the actual
calibration algorithm referenced.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 852-861: The tuple_num_bits field validator uses an assert to
check list length; replace the assert with raising a ValueError when
len(num_bits) != 2 so invalid inputs cannot slip through when Python assertions
are disabled. In the tuple_num_bits method (decorated with
`@field_validator`("num_bits", mode="before")), validate that a list has exactly
two elements and raise ValueError("num_bits list must have length 2") (or
similar) before casting and returning the tuple; keep the existing
cast("tuple[int, int]", tuple(num_bits)) and return semantics unchanged.
- Around line 1023-1031: The validator tuple_block_sizes_scale_bits is annotated
to return int | tuple[int, int] but it actually accepts and returns the
block_sizes dict (possibly mutated); update its return type annotation to
reflect a dict (e.g., dict[str, Any] or Mapping[str, Any]) so the
`@field_validator` signature matches the actual behavior and avoid type-checker
complaints.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1033-1050: The validator function validate_block_sizes uses Python
assert statements for input validation which should be replaced with explicit
exceptions for consistent runtime validation: locate validate_block_sizes
(decorated with `@field_validator`) and replace each assert with raising
ValueError (or TypeError where appropriate) including the same descriptive
messages — e.g., check info.data["axis"] is None and raise ValueError("axis must
be None when block_sizes is not None."), when dynamic type is detected use
cls._get_block_quant_axes_and_sizes(v) and raise ValueError("Dynamic block
quantization only supports quantization last axis.") if the length check fails,
and for the key/value checks raise ValueError describing invalid keys or types
(e.g., "invalid block_sizes key" or "block_sizes values must be int or None").
Ensure the validator still returns v on success.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09898683-6983-46be-8644-cb6cacc8d85a

📥 Commits

Reviewing files that changed from the base of the PR and between a720436 and 1a68373.

📒 Files selected for processing (25)
  • modelopt/torch/quantization/config.py
  • modelopt_recipes/configs/ptq/base_mamba.yml
  • modelopt_recipes/configs/ptq/calib_algo_local_hessian.yml
  • modelopt_recipes/configs/ptq/calib_algo_mse.yml
  • modelopt_recipes/configs/ptq/w4a4_nvfp4_static_nvfp4.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
✅ Files skipped from review due to trivial changes (9)
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
  • modelopt_recipes/configs/ptq/calib_algo_local_hessian.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README

Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: No-Behavior-Change Correctness

I found one real bug in the new recipe path, plus a few other discrepancies worth flagging.


🐛 Bug: Recipe KV quantization is broken (hf_ptq.py)

In the new recipe branch:

quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
    quant_cfg,
    recipe.kv_quant,   # ← wrong
)

recipe.kv_quant is the full config dict loaded from kv_quant.yml (e.g. {"quant_cfg": {"*[kv]_bmm_quantizer": {...}}}).

But update_quant_cfg_with_kv_cache_quant expects the quantizer patterns sub-dict as the second argument (not the full config wrapper). It does:

quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)

So with the current code, this updates the patterns dict with {"quant_cfg": {"*[kv]_bmm_quantizer": ...}} — adding a nested "quant_cfg" key into the patterns level. KV quantization is silently not applied.

The old non-recipe path correctly passes FP8_KV_CFG["quant_cfg"] (the sub-dict). The fix should be:

quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
    quant_cfg,
    recipe.kv_quant["quant_cfg"],   # ← take the sub-dict
)

Or redesign recipe.kv_quant to store just the patterns directly (and update the YAML/loader accordingly).


⚠️ Behavior change: NVFP4_FP8_MHA_CONFIG (modelopt/torch/quantization/config.py)

The old config had only "default": {"enable": False} in quant_cfg. The new version replaces that with **_default_disabled_quantizer_cfg, which includes many additional explicit disable patterns (*lm_head*, *block_sparse_moe.gate*, *router*, etc.).

The assert catches that the YAML matches the updated _DEPRECATED dict, so this is consistent — but it IS a semantic change in which patterns are disabled. This should be called out explicitly rather than claimed as "no behavior change."


⚠️ Intentional behavior change: auto_quantize loop variable bug fix (hf_ptq.py)

# Old (buggy):
assert all(
    args.qformat in [...]
    for args.qformat in qformat_list   # mutates args.qformat as side effect!
)

# New (correct):
assert all(
    qformat in [...]
    for qformat in qformat_list
)

This is a legitimate bug fix, but it IS a behavior change: the old code had args.qformat set to the last element of qformat_list after the all() call, which could affect downstream code. Worth acknowledging in the PR description.


💬 Minor: kv_quant.yml is required even for recipes that don't do KV quantization (modelopt/torch/recipe/loader.py)

_load_ptq_recipe raises ValueError if no kv_quant.yml is found. But a recipe might legitimately skip KV quantization. Consider making it optional and defaulting recipe.kv_quant to {}.


✅ Things that look correct

  • tuplelist conversion for num_bits and scale_bits in YAML, with the tuple_num_bits/tuple_block_sizes_scale_bits validators in QuantizerAttributeConfig to normalize them back — clean approach.
  • The __base__ YAML inheritance mechanism via OmegaConf merge is well-structured.
  • The assert-based equivalence checks for all the _DEPRECATED configs are a good regression guard.
  • Non-recipe paths in hf_ptq.py and multinode_ptq.py look functionally equivalent to before.

Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: No-Behavior-Change Correctness

I found one real bug in the new recipe path, plus a few other discrepancies.


Bug: Recipe KV quantization is broken (hf_ptq.py)

In the new recipe branch:

quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
    quant_cfg,
    recipe.kv_quant,   # wrong
)

recipe.kv_quant is the full config dict loaded from kv_quant.yml — e.g. {"quant_cfg": {"*[kv]_bmm_quantizer": {...}}}.

But update_quant_cfg_with_kv_cache_quant expects the quantizer patterns sub-dict as its second argument. It does:

quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)

So the current code ends up inserting a nested "quant_cfg" key into the patterns dict — KV quantization is silently not applied.

The old non-recipe path correctly passes FP8_KV_CFG["quant_cfg"]. Fix should be:

quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
    quant_cfg,
    recipe.kv_quant["quant_cfg"],   # take the sub-dict
)

Behavior change: NVFP4_FP8_MHA_CONFIG (modelopt/torch/quantization/config.py)

The old config had only "default": {"enable": False} in quant_cfg. The new version replaces that with **_default_disabled_quantizer_cfg, which includes many additional explicit disable patterns (*lm_head*, *block_sparse_moe.gate*, *router*, etc.). This is a semantic change even though the assert passes (the assert was also updated to match).


Behavior change / bug fix: auto_quantize loop variable (hf_ptq.py)

# Old (buggy): mutates args.qformat as a side effect
assert all(args.qformat in [...] for args.qformat in qformat_list)

# New (correct):
assert all(qformat in [...] for qformat in qformat_list)

This is a legitimate fix but IS a behavior change — the old code left args.qformat set to the last element of qformat_list after the assertion.


Minor: kv_quant.yml required even for recipes without KV quantization (modelopt/torch/recipe/loader.py)

_load_ptq_recipe raises ValueError if kv_quant.yml is missing. A recipe that skips KV quantization would fail to load. Consider making it optional.


Things that look correct

  • tuple→list conversion for num_bits/scale_bits in YAML, with validators to normalize back — clean.
  • The base YAML inheritance via OmegaConf merge is well-structured.
  • The assert-based equivalence checks for all _DEPRECATED configs are a solid regression guard.
  • Non-recipe paths in hf_ptq.py and multinode_ptq.py look functionally equivalent to before.

Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test comment from Claude Code review

@shengliangxu shengliangxu force-pushed the shengliangx/modeopt-recipe-lib-mvp branch from 1a68373 to 1b4216b Compare March 10, 2026 16:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/torch/quantization/config.py (1)

1541-1565: ⚠️ Potential issue | 🟡 Minor

Remove debug print statements.

Lines 1561 and 1564 contain debug print statements that should be removed or converted to proper logging. These will produce unexpected console output during normal operation.

Proposed fix
         # quantization like W4A8 has a list of weight quantizers
         if isinstance(cfg, list):
             for _config in cfg:
                 if _not_dynamic(_config):
-                    print(f"{cfg}: True")
                     return True
         elif _not_dynamic(cfg):
-            print(f"{cfg}: True")
             return True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1541 - 1565, In
need_calibration(), remove the two debug print statements that emit cfg to
stdout (the prints inside the loop for list items and the single-cfg branch) so
the function does not produce unexpected console output; either delete those
print calls or replace them with a proper logger.debug call using the module
logger (e.g., logging.getLogger(__name__).debug) if you need the diagnostics.
Ensure the changes are made inside the need_calibration function and do not
alter the _not_dynamic helper logic or return behavior.
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)

1027-1035: ⚠️ Potential issue | 🟡 Minor

Incorrect return type annotation.

The return type -> int | tuple[int, int] is incorrect. This validator operates on block_sizes which is a dict, and returns the (potentially modified) dict v.

Proposed fix
     `@field_validator`("block_sizes", mode="before")
     `@classmethod`
-    def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]:
+    def tuple_block_sizes_scale_bits(
+        cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None
+    ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None:
         """Convert block_sizes.scale_bits to tuple if list."""
         if v and v.get("scale_bits"):
             scale_bits = v.get("scale_bits")
             if isinstance(scale_bits, list):
                 v["scale_bits"] = tuple(scale_bits)
         return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1027 - 1035, The
validator tuple_block_sizes_scale_bits is annotated to return int | tuple[int,
int] but it actually receives and returns the block_sizes dict (possibly
modified); update the return type annotation to reflect a mapping/dict (e.g.,
dict[str, Any] or the appropriate BlockSizes type) and adjust imports if needed,
keeping the function name tuple_block_sizes_scale_bits, the
field_validator("block_sizes") decorator, and the scale_bits key handling
unchanged.

856-864: ⚠️ Potential issue | 🟡 Minor

Use ValueError instead of assert for input validation.

Line 861 uses assert len(num_bits) == 2 which is inconsistent with other validators in this class (e.g., validate_num_bits, validate_bias) that raise ValueError. Assertions can be disabled with python -O, allowing invalid inputs to pass silently.

Proposed fix
     `@field_validator`("num_bits", mode="before")
     `@classmethod`
     def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]:
         """Convert num_bits to tuple if list."""
         if isinstance(num_bits, list):
-            assert len(num_bits) == 2
+            if len(num_bits) != 2:
+                raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}")
             return cast("tuple[int, int]", tuple(num_bits))
 
         return num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 856 - 864, The
tuple_num_bits field validator uses an assert to check list length which can be
skipped under -O; replace the assert in tuple_num_bits with an explicit check
that raises ValueError (with a clear message like "num_bits list must have
length 2") when len(num_bits) != 2, then return the casted tuple as before so
behavior matches other validators (e.g., validate_num_bits, validate_bias).
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)

815-815: Type annotation includes pre-validation types.

The list[int] type in the annotation represents valid input that gets converted to tuple[int, int] by the tuple_num_bits validator. This is acceptable but could be clarified with a comment noting that list[int] inputs are converted to tuples during validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` at line 815, The type annotation for
the num_bits field currently includes list[int], which is accepted then
converted to tuple[int,int] by the tuple_num_bits validator; update the
declaration for num_bits (the ModeloptField definition) to keep the existing
union types but add a one-line comment explaining that list[int] inputs are
normalized to tuple[int,int] by the tuple_num_bits validator so readers
understand the pre-validation shape conversion.

166-178: Consider explicit checks for config parity validation.

The assert statements for config parity (e.g., lines 166, 178) will be disabled when Python runs with -O flag. If these checks are critical for production correctness, consider using explicit if ... raise RuntimeError(...) instead of assert.

However, if these are purely development-time sanity checks that will be caught in CI, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 166 - 178, Replace the
two assert-based parity checks with explicit runtime checks that raise on
mismatch: instead of using assert _default_disabled_quantizer_cfg_deprecated ==
_default_disabled_quantizer_cfg and assert
_mamba_moe_disabled_quantizer_cfg_deprecated ==
_mamba_moe_disabled_quantizer_cfg, implement explicit comparisons and raise a
RuntimeError (or ValueError) with a clear message referencing the mismatched
symbols (_default_disabled_quantizer_cfg_deprecated,
_default_disabled_quantizer_cfg, _mamba_moe_disabled_quantizer_cfg_deprecated,
_mamba_moe_disabled_quantizer_cfg) so these critical config-parity validations
remain active even when Python is run with -O.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1541-1565: In need_calibration(), remove the two debug print
statements that emit cfg to stdout (the prints inside the loop for list items
and the single-cfg branch) so the function does not produce unexpected console
output; either delete those print calls or replace them with a proper
logger.debug call using the module logger (e.g.,
logging.getLogger(__name__).debug) if you need the diagnostics. Ensure the
changes are made inside the need_calibration function and do not alter the
_not_dynamic helper logic or return behavior.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1027-1035: The validator tuple_block_sizes_scale_bits is annotated
to return int | tuple[int, int] but it actually receives and returns the
block_sizes dict (possibly modified); update the return type annotation to
reflect a mapping/dict (e.g., dict[str, Any] or the appropriate BlockSizes type)
and adjust imports if needed, keeping the function name
tuple_block_sizes_scale_bits, the field_validator("block_sizes") decorator, and
the scale_bits key handling unchanged.
- Around line 856-864: The tuple_num_bits field validator uses an assert to
check list length which can be skipped under -O; replace the assert in
tuple_num_bits with an explicit check that raises ValueError (with a clear
message like "num_bits list must have length 2") when len(num_bits) != 2, then
return the casted tuple as before so behavior matches other validators (e.g.,
validate_num_bits, validate_bias).

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Line 815: The type annotation for the num_bits field currently includes
list[int], which is accepted then converted to tuple[int,int] by the
tuple_num_bits validator; update the declaration for num_bits (the ModeloptField
definition) to keep the existing union types but add a one-line comment
explaining that list[int] inputs are normalized to tuple[int,int] by the
tuple_num_bits validator so readers understand the pre-validation shape
conversion.
- Around line 166-178: Replace the two assert-based parity checks with explicit
runtime checks that raise on mismatch: instead of using assert
_default_disabled_quantizer_cfg_deprecated == _default_disabled_quantizer_cfg
and assert _mamba_moe_disabled_quantizer_cfg_deprecated ==
_mamba_moe_disabled_quantizer_cfg, implement explicit comparisons and raise a
RuntimeError (or ValueError) with a clear message referencing the mismatched
symbols (_default_disabled_quantizer_cfg_deprecated,
_default_disabled_quantizer_cfg, _mamba_moe_disabled_quantizer_cfg_deprecated,
_mamba_moe_disabled_quantizer_cfg) so these critical config-parity validations
remain active even when Python is run with -O.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e11ee75-9da2-4cd9-bc6b-b57c06b19bc5

📥 Commits

Reviewing files that changed from the base of the PR and between 1a68373 and 1b4216b.

📒 Files selected for processing (29)
  • modelopt/torch/quantization/config.py
  • modelopt_recipes/configs/ptq/base_mamba.yml
  • modelopt_recipes/configs/ptq/calib_algo_local_hessian.yml
  • modelopt_recipes/configs/ptq/calib_algo_mse.yml
  • modelopt_recipes/configs/ptq/w4a4_nvfp4_static_nvfp4.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
✅ Files skipped from review due to trivial changes (3)
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
🚧 Files skipped from review as they are similar to previous changes (12)
  • modelopt_recipes/configs/ptq/calib_algo_mse.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/configs/ptq/base_mamba.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.yml

# limitations under the License.

__base__:
- configs/ptq/base
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it will be more straight forward to directly include the config here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

# See the License for the specific language governing permissions and
# limitations under the License.

recipe_type: ptq
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what this recipe.yml is supposed to do. I recalled that you mentioned calibration dataset here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)

1031-1039: ⚠️ Potential issue | 🟡 Minor

Incorrect return type annotation.

The return type -> int | tuple[int, int] is incorrect. This validator operates on block_sizes which is a dict, and returns the (potentially modified) dict v.

Proposed fix
     `@field_validator`("block_sizes", mode="before")
     `@classmethod`
-    def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]:
+    def tuple_block_sizes_scale_bits(
+        cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None
+    ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None:
         """Convert block_sizes.scale_bits to tuple if list."""
         if v and v.get("scale_bits"):
             scale_bits = v.get("scale_bits")
             if isinstance(scale_bits, list):
                 v["scale_bits"] = tuple(scale_bits)
         return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1031 - 1039, The
validator tuple_block_sizes_scale_bits annotated as returning "int | tuple[int,
int]" is wrong because it receives and returns the block_sizes dict; change the
return type to reflect that (e.g., "dict[str, Any] | None" or simply "dict |
None") and import Any if needed; keep the function signature on the
`@field_validator`("block_sizes", mode="before") classmethod and ensure the
implementation still returns the possibly-modified dict v.

860-868: ⚠️ Potential issue | 🟡 Minor

Use ValueError instead of assert for input validation.

Line 865 uses assert len(num_bits) == 2 which can be disabled with python -O, allowing invalid inputs to pass silently. This is inconsistent with other validators in this class (e.g., validate_num_bits, validate_bias) that raise ValueError.

Proposed fix
     `@field_validator`("num_bits", mode="before")
     `@classmethod`
     def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]:
         """Convert num_bits to tuple if list."""
         if isinstance(num_bits, list):
-            assert len(num_bits) == 2
+            if len(num_bits) != 2:
+                raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}")
             return cast("tuple[int, int]", tuple(num_bits))
 
         return num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 860 - 868, In
tuple_num_bits (the `@field_validator` method) replace the assert with explicit
input validation that raises ValueError when the incoming list does not have
length 2: check isinstance(num_bits, list), if so verify len(num_bits) == 2 and
if not raise ValueError("num_bits list must have length 2"), then return
tuple(num_bits) (cast as needed); keep returning num_bits for non-list inputs so
the method behavior matches other validators like validate_num_bits and
validate_bias.
🧹 Nitpick comments (1)
modelopt/torch/opt/config.py (1)

450-473: Base config resolution order allows local files to shadow built-in recipes.

The path search order checks local filesystem paths before BUILTIN_RECIPES_LIB. This means a user's local file at configs/ptq/base.yml could shadow the built-in recipe. While this may be intentional to support user overrides, it could cause unexpected behavior if unrelated local files accidentally match base config names.

Consider documenting this precedence behavior or adding a warning when a local file shadows a built-in recipe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/config.py` around lines 450 - 473, The current resolution
builds paths_to_check and checks them in an order that prefers local filesystem
files over BUILTIN_RECIPES_LIB, which can let a local config_file shadow a
built-in recipe; update the logic in the path-building/lookup block (the code
that constructs paths_to_check and the subsequent loop that sets config_path) to
detect when both a local candidate (Path or Traversable) and a corresponding
BUILTIN_RECIPES_LIB path exist and either (a) prefer the builtin by changing the
search order, or (b) keep the current precedence but log or raise a warning when
a local file shadows BUILTIN_RECIPES_LIB; implement the chosen behavior using
the existing variables config_file, paths_to_check, BUILTIN_RECIPES_LIB and
config_path so the change is localized and easy to review.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 897-899: The assertion message is misleading because the
isinstance check verifies the recipe's type, not whether load_recipe succeeded;
replace the assertion or message to reflect a type mismatch (e.g., raise a
TypeError or assert with "Loaded recipe is not a ModelOptPTQRecipe" or similar)
for the variable recipe and class ModelOptPTQRecipe so the failure clearly
indicates an unexpected recipe type rather than a loading error.
- Around line 908-911: The call passes the whole recipe.kv_quant wrapper to
mtq.update_quant_cfg_with_kv_cache_quant which expects the inner quantizer dict;
change the recipe path to pass recipe.kv_quant["quant_cfg"] (same extraction
used in the non-recipe path) so mtq.update_quant_cfg_with_kv_cache_quant
receives the inner dict and can .update(quant_cfg["quant_cfg"]) correctly;
update the call site that currently uses recipe.kv_quant to use
recipe.kv_quant["quant_cfg"] and keep the variable names quant_cfg and
kv_cache_quant_cfg consistent with the existing
update_quant_cfg_with_kv_cache_quant signature.

In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml`:
- Around line 18-19: The YAML description currently mentions "FP8 activation"
which conflicts with the filename "fp8_2d_blockwise_weight_only-fp8_kv.yml" and
the base config "w8_fp8_2d_blockwise" (weight-only); update the description
field to remove any reference to activation quantization and clearly state this
is weight-only FP8 (with FP8 KV cache and max calibration) so it matches the
filename and the w8 base config.

In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml`:
- Around line 18-19: The description string currently reads "FP8 per channel
weight, FP8 per tokenactivation, FP8 KV Cache, max calibration." and contains a
missing space; update the description value in the YAML (the description field)
to "FP8 per channel weight, FP8 per token activation, FP8 KV Cache, max
calibration." so "tokenactivation" is corrected to "token activation".

In `@modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml`:
- Around line 18-19: The YAML description incorrectly claims "Int4 activation"
while the referenced base config `w4_int4` has activation quantization disabled
(`*input_quantizer: enable: false`); update the top-level `description` string
(near the existing "Int4 weight, Int4 activation, FP8 KV Cache, AWQ
calibration.") to remove or correct the activation quantization claim (for
example: "Int4 weights, FP8 KV Cache, AWQ calibration" or similar), ensuring the
text accurately reflects that only weights are int4 and activations are not
quantized.

In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml`:
- Around line 18-25: The recipe description claims "no calibration" but the
__base__ model_quant list includes configs/ptq/calib_algo_max, causing a
mismatch; fix by making them consistent: either replace the inherited preset
configs/ptq/calib_algo_max with configs/ptq/calib_algo_none in the
__base__.model_quant list (to keep "no calibration") or update the description
text to state "max calibration" (if max calibration is intended); adjust the
description or the third entry accordingly so description and the model_quant
inheritance match.

In `@modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml`:
- Around line 18-19: The YAML description text in
nvfp4_svdquant_default-fp8_kv.yml incorrectly says "FP8 activation" while the
base config referenced (w4a4_nvfp4_nvfp4) uses NVFP4 activations; update the
description field to state "NVFP4 activation" (or otherwise match the base
config wording) so the description accurately reflects the activation format
used by w4a4_nvfp4_nvfp4.

In `@modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml`:
- Around line 18-20: The description string currently says "fp8 scale sweep
calibration" but the recipe name nvfp4_w4a4_weight_local_hessian-fp8_kv and the
base configs/ptq/calib_algo_local_hessian indicate it uses the local-Hessian
calibration; update the description to accurately mention "local Hessian
calibration" (or "local-Hessian fp8 scale calibration") and remove or replace
"fp8 scale sweep calibration" so the metadata matches the selected calibration
algorithm.

In `@modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml`:
- Around line 18-19: The description incorrectly states "FP8 weight" while this
recipe (w4a8_awq_beta-fp8_kv.yml) and its base config
(configs/ptq/w4a8_int4_blockwise_fp8_fp8) use Int4 blockwise weights; update the
description to reflect Int4 weights (e.g., mention "Int4 blockwise weights then
FP8 activation and FP8 KV cache") so the metadata matches the recipe name and
base config.

In `@modelopt/torch/recipe/loader.py`:
- Around line 32-45: The function load_recipe currently treats Path/Traversable
differently than str; normalize recipe_path at start (e.g., if
isinstance(recipe_path, (Path, Traversable)) convert to str or extract a Path
object) and then run the same built-in lookup and .yml/.yaml probing used for
str inputs, only skipping the built-in library resolution when the original
input is an absolute filesystem path (use Path.is_absolute() or os.path.isabs to
detect); apply the same normalization to the branch handling the
legacy-directory flow so both code paths share the same resolution logic and
behavior.
- Around line 84-94: The code assumes load_config(recipe_file) returns a dict
and uses assert for validation; change it to first check isinstance(data, dict)
and raise ValueError if not, then replace the assert recipe_type is not None
with raising ValueError describing the missing metadata.recipe_type, and for the
RecipeType.PTQ branch replace assert checks for "model_quant" and "kv_quant"
with explicit membership checks that raise ValueError with clear messages; apply
the same pattern to where recipe_descriptor is loaded (lines referencing
recipe_descriptor) — verify it is a dict before using .get and replace asserts
with ValueError exceptions.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1031-1039: The validator tuple_block_sizes_scale_bits annotated as
returning "int | tuple[int, int]" is wrong because it receives and returns the
block_sizes dict; change the return type to reflect that (e.g., "dict[str, Any]
| None" or simply "dict | None") and import Any if needed; keep the function
signature on the `@field_validator`("block_sizes", mode="before") classmethod and
ensure the implementation still returns the possibly-modified dict v.
- Around line 860-868: In tuple_num_bits (the `@field_validator` method) replace
the assert with explicit input validation that raises ValueError when the
incoming list does not have length 2: check isinstance(num_bits, list), if so
verify len(num_bits) == 2 and if not raise ValueError("num_bits list must have
length 2"), then return tuple(num_bits) (cast as needed); keep returning
num_bits for non-list inputs so the method behavior matches other validators
like validate_num_bits and validate_bias.

---

Nitpick comments:
In `@modelopt/torch/opt/config.py`:
- Around line 450-473: The current resolution builds paths_to_check and checks
them in an order that prefers local filesystem files over BUILTIN_RECIPES_LIB,
which can let a local config_file shadow a built-in recipe; update the logic in
the path-building/lookup block (the code that constructs paths_to_check and the
subsequent loop that sets config_path) to detect when both a local candidate
(Path or Traversable) and a corresponding BUILTIN_RECIPES_LIB path exist and
either (a) prefer the builtin by changing the search order, or (b) keep the
current precedence but log or raise a warning when a local file shadows
BUILTIN_RECIPES_LIB; implement the chosen behavior using the existing variables
config_file, paths_to_check, BUILTIN_RECIPES_LIB and config_path so the change
is localized and easy to review.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1f85b16-6aae-4bc1-ab7f-5b3bd8d0ff00

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4216b and 0d140ae.

📒 Files selected for processing (35)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/opt/config.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/recipe/loader.py
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.yml
✅ Files skipped from review due to trivial changes (1)
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml

@shengliangxu
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/torch/quantization/config.py (1)

768-804: ⚠️ Potential issue | 🟠 Major

Add MXFP6_DEFAULT_CFG to choices.

The module now exports MXFP6_DEFAULT_CFG at Lines 372-389, but it is missing from this whitelist. Any path that validates preset names against choices will reject the new config even though it is publicly defined.

🛠️ Suggested fix
     "MXFP4_DEFAULT_CFG",
+    "MXFP6_DEFAULT_CFG",
     "MXFP8_DEFAULT_CFG",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 768 - 804, The choices
set in config.py is missing the newly exported preset MXFP6_DEFAULT_CFG; update
the choices variable to include "MXFP6_DEFAULT_CFG" so validation against
choices accepts that preset (locate the choices declaration near the other
preset names and add the exact string "MXFP6_DEFAULT_CFG" alongside the other
entries).
♻️ Duplicate comments (6)
examples/llm_ptq/hf_ptq.py (1)

900-910: ⚠️ Potential issue | 🟠 Major

Recipe mode is still coupled to the legacy qformat flow.

The inline comment on Line 901 is not accurate: build_quant_cfg() in examples/llm_ptq/example_utils.py still keys some adjustments off the qformat string, and the later if args.qformat in QUANT_CFG_CHOICES check still decides whether quantization runs at all. With --recipe, that means the loaded recipe is no longer the source of truth—AWQ/SmoothQuant recipes can miss the existing qformat-specific handling, and a conflicting --qformat can skip quantization entirely. Please derive one effective format/mode from the recipe and use it consistently, or make --recipe mutually exclusive with the qformat-driven flags.

Also applies to: 954-965

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/hf_ptq.py` around lines 900 - 910, The recipe-driven quant
path is still influenced by args.qformat and QUANT_CFG_CHOICES; update the logic
so the effective quant format is derived from the recipe (or enforce mutual
exclusivity) before calling build_quant_cfg and before the QUANT_CFG_CHOICES
check: either (A) extract a qformat/mode from recipe (e.g., recipe.qformat or
recipe.type), set a local effective_format and pass that into build_quant_cfg
and the subsequent args.qformat checks, or (B) make --recipe mutually exclusive
with qformat-driven flags at CLI parse time and skip any args.qformat checks
when recipe is provided; apply this change around the build_quant_cfg(...) call
and the later if args.qformat in QUANT_CFG_CHOICES block and ensure
mtq.update_quant_cfg_with_kv_cache_quant uses the same derived/validated format.
modelopt/torch/recipe/loader.py (1)

32-74: ⚠️ Potential issue | 🟠 Major

Relative Path recipe inputs still bypass built-in discovery.

Line 65 sends every non-str input straight to resolved, so load_recipe(Path("general/ptq/fp8_default-fp8_kv")) never gets .yml / .yaml probing or modelopt_recipes lookup. Keep relative Path inputs on the same resolver as strings and only skip built-in lookup for absolute filesystem paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/recipe/loader.py` around lines 32 - 74, The function
load_recipe treats non-str inputs (Path/Traversable) as already resolved which
skips the BUILTIN_RECIPES_LIB lookup and suffix probing; change the resolver
logic so that when recipe_path is a Path that is relative (not absolute) it is
processed the same way as str inputs: attempt
BUILTIN_RECIPES_LIB.joinpath(recipe_path + suffix) and probe suffixes
[""|".yml"|".yaml"], falling back to filesystem candidates (Path(recipe_path +
suffix)); only bypass built-in lookup when the Path is absolute. Keep returning
via _load_recipe_from_file and _load_recipe_from_dir as before.
modelopt/torch/quantization/config.py (2)

860-868: ⚠️ Potential issue | 🟡 Minor

Use ValueError instead of assert in tuple_num_bits.

The guard on Line 865 disappears under python -O, which means custom-backend configs can accept arbitrary-length tuples coming from YAML lists. Raise a ValueError here instead of relying on assert.

🛠️ Suggested fix
     def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]:
         """Convert num_bits to tuple if list."""
         if isinstance(num_bits, list):
-            assert len(num_bits) == 2
+            if len(num_bits) != 2:
+                raise ValueError(f"num_bits list must have length 2, got {len(num_bits)}")
             return cast("tuple[int, int]", tuple(num_bits))
 
         return num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 860 - 868, The
tuple_num_bits field validator currently uses assert to check list length which
is ignored with python -O; change the check in tuple_num_bits (the
`@field_validator` classmethod) to explicitly raise a ValueError when num_bits is
a list but len(num_bits) != 2, providing a clear error message (e.g. "num_bits
list must have length 2") before casting to tuple[int, int]; keep the rest of
the logic intact so valid lists are converted to tuples and non-list inputs are
returned unchanged.

1031-1039: ⚠️ Potential issue | 🟡 Minor

Guard raw block_sizes inputs before calling .get().

This mode="before" validator runs before Pydantic has enforced the field type. A truthy non-dict input like [1, 2] will hit v.get(...) on Line 1035 and raise AttributeError instead of a validation error. The return annotation should also match the dict/None shape.

🛠️ Suggested fix
-    def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]:
+    def tuple_block_sizes_scale_bits(
+        cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None
+    ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None:
         """Convert block_sizes.scale_bits to tuple if list."""
-        if v and v.get("scale_bits"):
+        if isinstance(v, dict) and v.get("scale_bits"):
             scale_bits = v.get("scale_bits")
             if isinstance(scale_bits, list):
                 v["scale_bits"] = tuple(scale_bits)
         return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1031 - 1039, The
validator tuple_block_sizes_scale_bits currently assumes v is a dict and calls
v.get(...), which will raise AttributeError for non-dict raw inputs (e.g.,
lists) because this runs mode="before"; update the function to first guard v
with a type check (e.g., if not isinstance(v, dict) or v is None: return v) and
only then access v.get("scale_bits") and convert list -> tuple; also change the
return type annotation from int | tuple[int,int] to a shape matching the
validator (e.g., dict | None) so the signature reflects that the function
returns the (possibly mutated) dict or None, and keep references to block_sizes
and scale_bits in the implementation of tuple_block_sizes_scale_bits.
modelopt/torch/opt/config.py (2)

395-407: ⚠️ Potential issue | 🟠 Major

Anchor __base__ lookups to the loaded file.

Line 397 and Line 462 still pass relative bases back through load_config() without the current config_path, so built-in bases can be shadowed by CWD files and external recipe trees cannot resolve their own relative bases. Resolve each relative base_cfg against the matched file's parent (or the built-in library root) before recursing.

Also applies to: 440-472, 475-510

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/config.py` around lines 395 - 407, The current
_merge_base_list and related code call load_config(base_cfg) with raw relative
paths, which lets CWD files shadow built-ins and prevents recipe-relative
resolution; update the logic (in _merge_base_list and the code paths handling
__base__ at lines around where base_cfg is iterated) to resolve each relative
base_cfg against the parent directory of the file that matched the config (or
against the built-in library root for builtin identifiers) before passing it to
load_config: if base_cfg is relative, join it with the matched config's parent
path (or library root) and normalize the path, then call
load_config(resolved_path) so anchors are properly scoped to the loaded file.
Ensure the same resolution is applied wherever base_cfg entries are recursed
(the other functions handling __base__ merging).

509-511: ⚠️ Potential issue | 🟠 Major

Don't coerce falsey non-mapping YAML into {}.

Line 509 uses yaml.safe_load(...) or {}. That turns false, 0, [], or "" into an empty config, while non-empty scalars/lists still blow up later in _resolve_bases(). load_config() promises a mapping, so only None should become {}; other top-level types should raise ValueError.

🛠️ Suggested fix
-    config_data = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
-
-    return _resolve_bases(config_data)
+    loaded = yaml.safe_load(config_path.read_text(encoding="utf-8"))
+    if loaded is None:
+        config_data: dict[str, Any] = {}
+    elif not isinstance(loaded, dict):
+        raise ValueError(f"Config file {config_path} must contain a mapping at the top level.")
+    else:
+        config_data = loaded
+
+    return _resolve_bases(config_data)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/config.py` around lines 509 - 511, The current use of
yaml.safe_load(... ) or {} coerces falsey non-mapping values into an empty
config; instead, after calling yaml.safe_load(config_path.read_text(...))
capture the result into config_data, set config_data = {} only if the result is
None, and if the result is not a mapping (not an instance of dict) raise a
ValueError describing the unexpected top-level YAML type; then pass the
validated mapping to _resolve_bases. Reference yaml.safe_load, the local
variable config_data, the load_config function context, and _resolve_bases to
locate where to implement this check.
🧹 Nitpick comments (1)
tests/unit/torch/recipe/test_loader.py (1)

316-335: Please smoke-test the new built-in recipes.

These tests only exercise general/ptq/fp8_default-fp8_kv, so the new modelopt_recipes/general/ptq/*-fp8_kv.yml assets can still ship with broken __base__ references or metadata without failing CI.

🧪 Possible addition
+@pytest.mark.parametrize(
+    "recipe_name",
+    [
+        "general/ptq/mxfp8_default-fp8_kv",
+        "general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv",
+        "general/ptq/int8_weight_only-fp8_kv",
+        "general/ptq/nvfp4_awq_clip-fp8_kv",
+        "general/ptq/mamba_moe_nvfp4_conservative-fp8_kv",
+    ],
+)
+def test_load_recipe_new_builtins(recipe_name):
+    recipe = load_recipe(recipe_name)
+    assert recipe.recipe_type == RecipeType.PTQ
+    assert isinstance(recipe, ModelOptPTQRecipe)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/recipe/test_loader.py` around lines 316 - 335, The tests
only load a single built-in recipe; iterate over all built-in PTQ recipe assets
(e.g., files matching the pattern modelopt_recipes/general/ptq/*-fp8_kv.yml) and
call load_recipe for each to smoke-test them: for every path call load_recipe
(same function used in
test_load_recipe_builtin_with_suffix/test_load_recipe_builtin_without_suffix),
assert the returned recipe.recipe_type is RecipeType.PTQ and that
recipe.description is a non-empty string (and optionally that
ModelOptPTQRecipe-specific fields like model_quant/kv_quant are present when
appropriate) so any broken __base__ references or missing metadata fail CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/opt/config.py`:
- Around line 484-497: The code currently treats Path/Traversable configs as
final candidates and skips suffix probing and builtin lookups; change the branch
handling config_file (the Path/Traversable case) to mirror the str branch: if
config_file is a relative Path (not config_file.is_absolute()), probe for
suffixes by appending ".yml" and ".yaml" and also add
BUILTIN_RECIPES_LIB.joinpath(...) variants; if the Path already has a .yml/.yaml
suffix add it and the BUILTIN_RECIPES_LIB joinpath, and only bypass builtin
resolution for absolute Paths. Ensure you still append candidates to
paths_to_check and reuse the same variable names (config_file, paths_to_check,
BUILTIN_RECIPES_LIB).

In `@modelopt/torch/quantization/utils.py`:
- Around line 847-849: quant_cfg["quant_cfg"] can be explicitly set to None
which causes an AttributeError when calling .update(); change the logic to treat
falsy values as the default dict before updating: read the existing inner =
quant_cfg.get("quant_cfg") or {"default": {"enable": False}}, assign it back to
quant_cfg["quant_cfg"] and then .update(kv_cache_quant_cfg); reference the local
variable quant_cfg and the update call that currently uses kv_cache_quant_cfg to
locate the spot to change.

---

Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 768-804: The choices set in config.py is missing the newly
exported preset MXFP6_DEFAULT_CFG; update the choices variable to include
"MXFP6_DEFAULT_CFG" so validation against choices accepts that preset (locate
the choices declaration near the other preset names and add the exact string
"MXFP6_DEFAULT_CFG" alongside the other entries).

---

Duplicate comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 900-910: The recipe-driven quant path is still influenced by
args.qformat and QUANT_CFG_CHOICES; update the logic so the effective quant
format is derived from the recipe (or enforce mutual exclusivity) before calling
build_quant_cfg and before the QUANT_CFG_CHOICES check: either (A) extract a
qformat/mode from recipe (e.g., recipe.qformat or recipe.type), set a local
effective_format and pass that into build_quant_cfg and the subsequent
args.qformat checks, or (B) make --recipe mutually exclusive with qformat-driven
flags at CLI parse time and skip any args.qformat checks when recipe is
provided; apply this change around the build_quant_cfg(...) call and the later
if args.qformat in QUANT_CFG_CHOICES block and ensure
mtq.update_quant_cfg_with_kv_cache_quant uses the same derived/validated format.

In `@modelopt/torch/opt/config.py`:
- Around line 395-407: The current _merge_base_list and related code call
load_config(base_cfg) with raw relative paths, which lets CWD files shadow
built-ins and prevents recipe-relative resolution; update the logic (in
_merge_base_list and the code paths handling __base__ at lines around where
base_cfg is iterated) to resolve each relative base_cfg against the parent
directory of the file that matched the config (or against the built-in library
root for builtin identifiers) before passing it to load_config: if base_cfg is
relative, join it with the matched config's parent path (or library root) and
normalize the path, then call load_config(resolved_path) so anchors are properly
scoped to the loaded file. Ensure the same resolution is applied wherever
base_cfg entries are recursed (the other functions handling __base__ merging).
- Around line 509-511: The current use of yaml.safe_load(... ) or {} coerces
falsey non-mapping values into an empty config; instead, after calling
yaml.safe_load(config_path.read_text(...)) capture the result into config_data,
set config_data = {} only if the result is None, and if the result is not a
mapping (not an instance of dict) raise a ValueError describing the unexpected
top-level YAML type; then pass the validated mapping to _resolve_bases.
Reference yaml.safe_load, the local variable config_data, the load_config
function context, and _resolve_bases to locate where to implement this check.

In `@modelopt/torch/quantization/config.py`:
- Around line 860-868: The tuple_num_bits field validator currently uses assert
to check list length which is ignored with python -O; change the check in
tuple_num_bits (the `@field_validator` classmethod) to explicitly raise a
ValueError when num_bits is a list but len(num_bits) != 2, providing a clear
error message (e.g. "num_bits list must have length 2") before casting to
tuple[int, int]; keep the rest of the logic intact so valid lists are converted
to tuples and non-list inputs are returned unchanged.
- Around line 1031-1039: The validator tuple_block_sizes_scale_bits currently
assumes v is a dict and calls v.get(...), which will raise AttributeError for
non-dict raw inputs (e.g., lists) because this runs mode="before"; update the
function to first guard v with a type check (e.g., if not isinstance(v, dict) or
v is None: return v) and only then access v.get("scale_bits") and convert list
-> tuple; also change the return type annotation from int | tuple[int,int] to a
shape matching the validator (e.g., dict | None) so the signature reflects that
the function returns the (possibly mutated) dict or None, and keep references to
block_sizes and scale_bits in the implementation of
tuple_block_sizes_scale_bits.

In `@modelopt/torch/recipe/loader.py`:
- Around line 32-74: The function load_recipe treats non-str inputs
(Path/Traversable) as already resolved which skips the BUILTIN_RECIPES_LIB
lookup and suffix probing; change the resolver logic so that when recipe_path is
a Path that is relative (not absolute) it is processed the same way as str
inputs: attempt BUILTIN_RECIPES_LIB.joinpath(recipe_path + suffix) and probe
suffixes [""|".yml"|".yaml"], falling back to filesystem candidates
(Path(recipe_path + suffix)); only bypass built-in lookup when the Path is
absolute. Keep returning via _load_recipe_from_file and _load_recipe_from_dir as
before.

---

Nitpick comments:
In `@tests/unit/torch/recipe/test_loader.py`:
- Around line 316-335: The tests only load a single built-in recipe; iterate
over all built-in PTQ recipe assets (e.g., files matching the pattern
modelopt_recipes/general/ptq/*-fp8_kv.yml) and call load_recipe for each to
smoke-test them: for every path call load_recipe (same function used in
test_load_recipe_builtin_with_suffix/test_load_recipe_builtin_without_suffix),
assert the returned recipe.recipe_type is RecipeType.PTQ and that
recipe.description is a non-empty string (and optionally that
ModelOptPTQRecipe-specific fields like model_quant/kv_quant are present when
appropriate) so any broken __base__ references or missing metadata fail CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfdf41d8-4d47-46b4-b9e6-f063d9dde90f

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4216b and e39d5ee.

📒 Files selected for processing (41)
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/opt/config.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/utils.py
  • modelopt/torch/recipe/loader.py
  • modelopt_recipes/configs/ptq/base.yml
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.yml
  • pyproject.toml
  • tests/unit/torch/recipe/__init__.py
  • tests/unit/torch/recipe/test_loader.py
✅ Files skipped from review due to trivial changes (2)
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • tests/unit/torch/recipe/init.py
🚧 Files skipped from review as they are similar to previous changes (27)
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
  • modelopt_recipes/configs/ptq/base.yml
  • modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxint8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.yml
  • pyproject.toml
  • modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml
  • modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp6_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml
  • examples/llm_ptq/example_utils.py
  • modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/int8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.yml
  • modelopt_recipes/general/ptq/mxfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.yml
  • modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.yml
  • modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.yml
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml

- configs/ptq/w8a8_fp8_fp8
- configs/ptq/calib_algo_max

kv_quant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a separate kv_quant field?

kv_quant:
- configs/ptq/kv_fp8

model_quant:
Copy link
Contributor

@realAsma realAsma Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model_quant and kv_quant looks confusing to me.

Why do we need these fields? Can it be flat?

configs/ptq/quant_cfg/base
configs/ptq/quant_cfg/base_mamba
configs/ptq/quant_cfg/w8a8_fp8_fp8
configs/ptq/algorithm/calib_algo_max
configs/ptq/quant_cfg/kv_fp8

@jenchen13 jenchen13 self-requested a review March 12, 2026 22:52
@shengliangxu shengliangxu requested a review from a team as a code owner March 13, 2026 01:15
1. start a new config system using yaml/yml files. The config system
   adopts Hydra style override, using the __base__ tag, which matches
   pydantic's __base__ model mechnism. __base__ is a list, it
   semantically is inheritance. The implementation is using OmegaConf to
   merge configs, by the order in the __base__ list.

2. add a new top level package: modelopt_recipes

   I want it to be a top level package so we can make it clear that the
   modelopt package holds the code, this new package holds recipes

3. make a list of reusable quant config units, sitting at
   modelopt_recipes/configs/ptq/.... Mostly 3 types:

   2.1 model weights and activations quant config
   2.2 kv quant config
   2.3 algorithm config

   with the __base__ config mechnism, we can have this much clearer
   config system

4. implement some of the existing quantization recipes using the new
   config system as model agnostic general recipes, but not actually in
   use. these recipes sit inside modelopt_recipes/general/ptq/...

5. make sure the configs from the new config system match the exisiting configs

6. extend the hf_ptq script to enable recipe based PTQ

7. testted hf_ptq using both builtin and extenal config file. example
   script:

   python examples/llm_ptq/hf_ptq.py         \
     --model Qwen/Qwen3-8B                   \
     --recipe general/ptq/fp8_default-fp8_kv \
     --export_path=fp8_default-fp8_kv        \
     --calib_size=16                         \
     --batch_size=0                          \
     --trust_remote_code                     \
     --export_fmt=hf

8. A recipe can be a single yaml file or a directory with a
   recipe.{yml,yaml} file

9. use list instead of tuple in the hard coded configs. yaml files do
   not have tuple type. for the bits, we accept both tuples and lists,
   using lists in code so we can directly compare yaml file contents
   with code configs.

10. make the config file based configs default and deprecate the
    hardcoded ones. Convert all PTQ configs

11. Added tests for various recipe loading

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/modeopt-recipe-lib-mvp branch from 5fe9f37 to 03ad6a9 Compare March 13, 2026 01:21
@shengliangxu shengliangxu force-pushed the shengliangx/modeopt-recipe-lib-mvp branch from 398cbaa to 901b948 Compare March 13, 2026 07:26
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/modeopt-recipe-lib-mvp branch from 9fcf0f1 to a6529be Compare March 13, 2026 08:00

# Check if any bmm_quantizer is in the quant_cfg. If so, we need to enable the bmm_quantizer.
if enable_quant_kv_cache:
quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants