ModelOpt Framework, Recipe Lib, converting subset of existing recipes 1/N#1000
ModelOpt Framework, Recipe Lib, converting subset of existing recipes 1/N#1000shengliangxu wants to merge 4 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a recipe-driven PTQ system: new recipe models and loader, YAML-based config loading/merging with base inheritance, many PTQ recipe/config YAMLs, example CLI support for recipes, and related refactors to quant_cfg handling across examples and quantization utilities. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PTQScript as PTQ Script<br/>(hf_ptq.py)
participant RecipeLoader as Recipe Loader<br/>(loader.py)
participant ConfigLoader as Config Loader<br/>(opt/config.py)
participant FileSystem as File System/<br/>Built-in Resources
participant Quantizer as Quantizer<br/>(mtq / quant utils)
User->>PTQScript: run with --recipe path/to/recipe
PTQScript->>RecipeLoader: load_recipe(recipe_path)
RecipeLoader->>FileSystem: resolve path (builtin or filesystem)
alt merged YAML file
RecipeLoader->>FileSystem: read recipe YAML
RecipeLoader->>ConfigLoader: load_config(model_quant references)
ConfigLoader->>FileSystem: read referenced YAMLs, resolve __base__ recursively
ConfigLoader-->>RecipeLoader: merged QuantizeConfig
else directory-format recipe
RecipeLoader->>FileSystem: read recipe descriptor and model_quant/kv_quant files
RecipeLoader->>ConfigLoader: load_config for each component
ConfigLoader-->>RecipeLoader: merged components
end
RecipeLoader-->>PTQScript: ModelOptPTQRecipe (model_quant, kv_quant)
PTQScript->>PTQScript: build_quant_cfg(recipe.model_quant)
PTQScript->>PTQScript: update_quant_cfg_with_kv_cache_quant(recipe.kv_quant)
PTQScript->>Quantizer: quantize(model, quant_cfg)
Quantizer-->>User: quantized model
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
ee14948 to
c2d3678
Compare
There was a problem hiding this comment.
Actionable comments posted: 16
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/llm_ptq/example_utils.py (1)
196-241:⚠️ Potential issue | 🟠 MajorClone
quant_cfgbefore applying model-specific overrides.This function mutates the passed-in config in the
moe_calib_experts_ratio, Gemmaint8_sq, andphi4mmbranches; only the AWQ path gets a deep copy today. Reusing the same recipe/config object across calls will leak those edits into later quantization runs.Proposed fix
def build_quant_cfg( qformat, quant_cfg, awq_block_size, model_type, moe_calib_experts_ratio: float | None = None, ) -> dict[str, Any]: + quant_cfg = copy.deepcopy(quant_cfg) if "awq" in str(quant_cfg.get("algorithm")): - quant_cfg = copy.deepcopy(quant_cfg) weight_quantizer = quant_cfg["quant_cfg"]["*weight_quantizer"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 196 - 241, The function build_quant_cfg mutates the input quant_cfg in multiple branches (moe_calib_experts_ratio, Gemma int8_sq, phi4mm), causing callers to observe leaked changes; fix this by making a deep copy of quant_cfg immediately on entry (e.g., quant_cfg = copy.deepcopy(quant_cfg)) and then perform all modifications on that copy (so weight_quantizer, quant_cfg["algorithm"], and quant_cfg["quant_cfg"] edits do not affect the caller's object). Ensure the deepcopy happens before any reads/writes that may alter nested structures in build_quant_cfg.
🟡 Minor comments (11)
modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorFilename appears to have unintentional duplication.
The filename
w4a4_nvfp4_mlp_only_nvfp4_mlp_only.ymlcontains_nvfp4_mlp_onlyrepeated twice. This looks like a copy-paste error. Consider renaming to something likew4a4_nvfp4_mlp_only.ymlfor consistency with other config naming conventions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml` around lines 1 - 2, The file name contains a duplicated token ("_nvfp4_mlp_only") — rename the config file from w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to w4a4_nvfp4_mlp_only.yml and update any references to this file (e.g., in tests, loaders, or CI) so they point to the new filename to keep naming consistent with other configs.modelopt_recipes/configs/ptq/kv_fp8_affine.yml-16-22 (1)
16-22:⚠️ Potential issue | 🟡 MinorAdd missing
enable: trueproperty.The YAML structure for the
biassection is correct; however,enable: trueshould be added as a sibling property tonum_bitsandbias, matching the structure in related config files (kv_fp8.yml,kv_nvfp4_affine.yml).Proposed fix
quant_cfg: '*[kv]_bmm_quantizer': num_bits: [4, 3] bias: -2: -4: type: static + enable: true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/ptq/kv_fp8_affine.yml` around lines 16 - 22, In the quant_cfg block for '*[kv]_bmm_quantizer' add the missing sibling property "enable: true" (next to num_bits and bias) so the entry reads the same structure as other configs (e.g., kv_fp8.yml and kv_nvfp4_affine.yml); update the YAML under the '*[kv]_bmm_quantizer' key to include enable: true directly beneath num_bits and alongside bias to enable the quantizer.modelopt_recipes/general/README-3-3 (1)
3-3:⚠️ Potential issue | 🟡 MinorFix the typo in the PTQ description.
Traingshould beTraining.Suggested fix
-ptq/*: model-agnostic general Post Traing Quantization recipes +ptq/*: model-agnostic general Post Training Quantization recipes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/general/README` at line 3, Fix the typo in the README line that currently reads "ptq/*: model-agnostic general Post Traing Quantization recipes" by changing "Traing" to "Training" so it reads "ptq/*: model-agnostic general Post Training Quantization recipes"; locate the string containing "ptq/*" or "Post Traing Quantization" and update it accordingly.modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml-20-22 (1)
20-22:⚠️ Potential issue | 🟡 MinorUse explicit
axis: nullor omit the key entirely.Line 22 has a blank
axis:that parses to YAML null. While this works correctly (functionally equivalent toaxis: Nonefor per-tensor quantization), it's ambiguous and inconsistent with explicit patterns in Python tests and code. For clarity and consistency across the codebase, either writeaxis: nullexplicitly or remove the key entirely (the schema defaultsaxistoNone).This pattern appears in other configs as well (w8a8_fp8_fp8.yml, w4a8_mxfp4_fp8.yml).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml` around lines 20 - 22, The YAML fragment under the "*input_quantizer" mapping currently contains a blank "axis:" key which parses to null but is ambiguous; update the mapping by either setting "axis: null" explicitly or removing the "axis" key entirely so the schema default (None) is used—apply the same change wherever you see the blank "axis:" pattern (e.g., in the w8a8_fp8_fp8.yml and w4a8_mxfp4_fp8.yml configs) to keep the intent consistent with Python tests and code.modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml-18-18 (1)
18-18:⚠️ Potential issue | 🟡 MinorTypo in description: missing space.
"per tokenactivation" should be "per token activation".
📝 Proposed fix
-description: "FP8 per channel weight, FP8 per tokenactivation, FP8 KV Cache, max calibration." +description: "FP8 per channel weight, FP8 per token activation, FP8 KV Cache, max calibration."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml` at line 18, Description string contains a typo "per tokenactivation"; update the description value in recipe.yml (the description field) to read "FP8 per channel weight, FP8 per token activation, FP8 KV Cache, max calibration." — locate the description key in modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml and insert the missing space between "token" and "activation".modelopt/torch/opt/config.py-426-430 (1)
426-430:⚠️ Potential issue | 🟡 MinorValidate the YAML root before assuming a mapping.
Blank/comment-only files or a list/scalar document make Line 430 fail with
AttributeErrorbecauseyaml.safe_load()can return non-dicts. Raise aValueErrorhere so recipe load failures stay actionable.Proposed fix
content = config_path.read_text(encoding="utf-8") config_data = yaml.safe_load(content) + if not isinstance(config_data, dict): + raise ValueError( + f"Config {config_path} must contain a top-level mapping, " + f"got {type(config_data).__name__}" + ) bases = [load_config(base_cfg) for base_cfg in config_data.pop("__base__", [])]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/config.py` around lines 426 - 430, The code reads YAML into config_data and then assumes it's a mapping when building bases in load_config; validate that config_data is a dict after yaml.safe_load(content) and raise a clear ValueError if not (e.g., for None, list, or scalar documents) so recipe load failures remain actionable; update the logic in the function around config_data and the bases = [load_config(...)] line to check isinstance(config_data, dict) and raise with a message referencing config_path and the unexpected type.modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml-18-18 (1)
18-18:⚠️ Potential issue | 🟡 MinorDescription says this recipe quantizes activations, but the recipe is weight-only.
Line 18 conflicts with the
weight_onlyrecipe name and the companionmodel_quant.ymlbase (configs/ptq/w8_fp8_2d_blockwise). That makes the catalog metadata misleading when users pick recipes.📝 Suggested fix
-description: "FP8 2D blockwise weights, FP8 activation, FP8 KV Cache, max calibration." +description: "FP8 2D blockwise weights, no activation quantization, FP8 KV Cache, max calibration."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml` at line 18, The description string currently reads "FP8 2D blockwise weights, FP8 activation, FP8 KV Cache, max calibration." but the recipe is weight-only (see weight_only and base model_quant.yml reference to configs/ptq/w8_fp8_2d_blockwise); update the description to accurately reflect that only weights are quantized (e.g., "FP8 2D blockwise weight-only quantization, FP8 KV Cache, max calibration") and remove or reword any mention of FP8 activation so catalog metadata matches the recipe name and base config.modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml-16-19 (1)
16-19:⚠️ Potential issue | 🟡 MinorRename config file to remove duplicated suffix.
The referenced config file
w4a4_nvfp4_mlp_only_nvfp4_mlp_only.ymlhas a clear naming error: the suffix_nvfp4_mlp_onlyis duplicated. The file should be renamed tow4a4_nvfp4_mlp_only.ymlfor clarity. After renaming, update this reference fromconfigs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_onlytoconfigs/ptq/w4a4_nvfp4_mlp_only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml` around lines 16 - 19, Update the duplicated config reference: replace the base entry "configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only" with the corrected name "configs/ptq/w4a4_nvfp4_mlp_only" in the __base__ list inside model_quant.yml; also ensure the actual file on disk is renamed from w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to w4a4_nvfp4_mlp_only.yml so the new reference resolves correctly.modelopt/torch/recipe/config.py-35-38 (1)
35-38:⚠️ Potential issue | 🟡 MinorReplace the copied docstrings in these public models.
Both docstrings describe unrelated layer-rule behavior instead of recipe configuration, so they will mislead anyone relying on generated docs or IDE help.
✏️ Suggested cleanup
class ModelOptRecipeBase(ModeloptBaseConfig): - """Base configuration class for model optimization recipes. - - If a layer name matches ``"*output_layer*"``, the attributes will be replaced with ``{"enable": False}``. - """ + """Base configuration class for model optimization recipes.""" @@ class ModelOptPTQRecipe(ModelOptRecipeBase): - """Our config class for PTQ recipes. - - Rules are what governs the configuration for modifying dynamic module classes. - """ + """Configuration model for PTQ recipes."""Also applies to: 66-69
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/recipe/config.py` around lines 35 - 38, The docstrings in the public configuration classes in modelopt/torch/recipe/config.py currently contain incorrect, copied text about layer-rule behavior; update the docstrings to accurately describe each class's purpose and public API (e.g., the base configuration class for model optimization recipes) and remove the unrelated sentence about matching "*output_layer*" and replacing attributes; ensure both occurrences (the base config class and the other public config class in this file) describe their configuration fields, defaults, and intended usage for generated docs and IDE help.modelopt/torch/quantization/config.py-845-853 (1)
845-853:⚠️ Potential issue | 🟡 MinorReplace
assertwithValueErrorin validator.Using
assertin a field validator is fragile because assertions can be disabled withpython -O. Pydantic validators should raiseValueErrororTypeErrorfor invalid input.🛡️ Proposed fix
`@field_validator`("num_bits", mode="before") `@classmethod` def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]: """Convert num_bits to tuple if list.""" if isinstance(num_bits, list): - assert len(num_bits) == 2 + if len(num_bits) != 2: + raise ValueError("num_bits list must have exactly 2 elements") return cast("tuple[int, int]", tuple(num_bits)) return num_bits🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 845 - 853, The tuple_num_bits field validator currently uses assert to check list length, which can be disabled; update the validator (tuple_num_bits in modelopt/torch/quantization/config.py) to raise a ValueError (or TypeError) with a clear message when num_bits is a list but len(num_bits) != 2, then return the cast tuple as before; leave other behavior unchanged.modelopt/torch/recipe/loader.py-76-79 (1)
76-79:⚠️ Potential issue | 🟡 MinorReplace
assertwith explicitValueErrorfor validation.Using
assertfor input validation is fragile because assertions can be disabled when Python runs with the-O(optimize) flag. This would silently skip validation and could lead to cryptic errors downstream.🛡️ Proposed fix
- assert isinstance(recipe_descriptor, dict), "Recipe descriptor should be a dictionary." - assert "recipe_type" in recipe_descriptor, ( - "Recipe descriptor should contain 'recipe_type' field." - ) + if not isinstance(recipe_descriptor, dict): + raise ValueError("Recipe descriptor should be a dictionary.") + if "recipe_type" not in recipe_descriptor: + raise ValueError("Recipe descriptor should contain 'recipe_type' field.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/recipe/loader.py` around lines 76 - 79, Replace the fragile assert-based checks for the incoming recipe_descriptor with explicit validation that raises ValueError: check that recipe_descriptor is a dict and if not raise ValueError("Recipe descriptor should be a dictionary."); then check that "recipe_type" is present and if not raise ValueError("Recipe descriptor should contain 'recipe_type' field."). Update the validation in loader.py where recipe_descriptor is handled so the function (the block containing the current assert lines) raises these ValueErrors instead of using assert.
🧹 Nitpick comments (5)
modelopt_recipes/configs/ptq/calib_algo_none.yml (1)
16-16: Make the null sentinel explicit.
algorithm:parses asnull, but spelling that out makes the “no calibration” intent clearer in these composed configs.Suggested cleanup
-algorithm: +algorithm: null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/ptq/calib_algo_none.yml` at line 16, The YAML key "algorithm" currently parses to null implicitly; make the "no calibration" intent explicit by setting the algorithm field to an explicit null sentinel (e.g., use the YAML null token) so readers and config composition logic see the intended "no calibration" value for the algorithm key.modelopt_recipes/configs/README (1)
1-3: Clarify the path wording in this README.Line 3 reads awkwardly and leaves the reference root implicit. I’d rewrite this to something like:
Recipes can reference shared configs via paths under configs/....🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/README` around lines 1 - 3, Update the README sentence "The recipes can refer to the configs using path of: configs/..." to clearer wording; replace it with "Recipes can reference shared configs via paths under configs/..." so the reference root is explicit and reads naturally (edit the README line containing that original phrase).modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml (1)
18-18: Tighten the recipe description wording.The current phrase “no quantization activation” is hard to parse. Something like
NVFP4 MLP weight-only, no activation quantization, FP8 KV cache, max calibrationwould read more cleanly and better match the folder name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml` at line 18, Update the description string in recipe.yml to a clearer phrasing that matches the folder name and intent: replace the current description value with something like "NVFP4 MLP weight-only, no activation quantization, FP8 KV cache, max calibration" so it reads cleanly and explicitly references weight-only MLP, no activation quantization, FP8 KV cache, and max calibration.modelopt_recipes/configs/ptq/kv_fp8.yml (1)
16-19: Document whatnum_bits: [4, 3]means here.This file is a shared base used by multiple recipes, but the FP8 encoding behind
[4, 3]is not obvious at a glance. A short inline comment would make future edits much safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/configs/ptq/kv_fp8.yml` around lines 16 - 19, Add a short inline comment next to the num_bits entry in the '*[kv]_bmm_quantizer' block explaining that num_bits: [4, 3] specifies the FP8 format (4 exponent bits and 3 mantissa/significand bits, i.e., E4M3) and note any relevant bias/representation detail or mapping used by your quantizer implementation so future editors understand the encoding expected by kv_fp8.yml and the quantizer code paths (e.g., references to E4M3 vs E3M4 conventions).modelopt/torch/quantization/config.py (1)
149-168: Consider the implications of module-level assertions.The assertions at module import time (line 166-168 and throughout the file) tightly couple Python code to YAML config files. This has two concerns:
Import failure risk: If any referenced YAML file is missing or doesn't match, the entire
modelopt.torch.quantization.configmodule fails to import, breaking all downstream code.Assertion bypass: With
python -O, assertions are disabled, silently skipping these consistency checks.If these checks are critical, consider:
- Moving them to a dedicated test file
- Using explicit
if ... raise RuntimeError(...)instead ofassert- Adding a lazy validation approach triggered on first use
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 149 - 168, The module currently performs a module-level assert comparing OmegaConf.to_container(OmegaConf.create({"quant_cfg": _default_disabled_quantizer_cfg}), resolve=True) to load_config("configs/ptq/base"), which can break imports or be bypassed under python -O; replace that assert with an explicit runtime check that raises a clear RuntimeError if the configs mismatch (e.g., compute the two values using OmegaConf.to_container and load_config and if they differ raise RuntimeError with a descriptive message), or alternatively move this validation into a dedicated unit test or a lazily-invoked validator function called on first use; ensure you update the code around _default_disabled_quantizer_cfg, OmegaConf.to_container/OmegaConf.create, and load_config(...) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/multinode_ptq.py`:
- Around line 330-347: QUANT_CFG_CHOICES[args.qformat] is a shared template and
must be cloned before mutation to avoid leaking changes; import the stdlib copy
module and call copy.deepcopy on QUANT_CFG_CHOICES[args.qformat] into quant_cfg
before passing it to build_quant_cfg, and likewise ensure when calling
mtq.update_quant_cfg_with_kv_cache_quant you pass a cloned quant_cfg (or
deepcopy the KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"] if that is
mutated) so neither build_quant_cfg nor update_quant_cfg_with_kv_cache_quant
mutates module-level objects.
In `@modelopt_recipes/configs/ptq/kv_nvfp4_affine.yml`:
- Around line 19-26: The YAML places "type: dynamic" and "scale_bits: [4, 3]"
under block_sizes and "type: static" under bias, but these should be sibling
properties of the root quantization config rather than nested; move "type:
dynamic" and "scale_bits: [4, 3]" out from inside block_sizes and place them at
the same level as block_sizes (e.g., alongside block_sizes: and bias:), and
likewise move "type: static" out from inside bias to be a sibling property (or
rename it if the schema expects a different bias quantization key); update the
entries for block_sizes, bias, type, and scale_bits accordingly so they are
top-level siblings in that config block.
In `@modelopt_recipes/configs/ptq/kv_nvfp4_rotate.yml`:
- Around line 22-27: The YAML nests type: dynamic and scale_bits: [4, 3] under
block_sizes within k_bmm_quantizer and v_bmm_quantizer; move those keys out of
the block_sizes mapping so block_sizes only contains size mappings (e.g., -1:
16) and place type and scale_bits as sibling keys of block_sizes under
k_bmm_quantizer and v_bmm_quantizer respectively, preserving enable and rotate
keys.
In `@modelopt_recipes/configs/ptq/w4_int4.yml`:
- Around line 19-22: The YAML has `type: static` incorrectly nested under
`block_sizes`; move the `type` key so it is a sibling of `block_sizes` (i.e.,
same indentation level) within the *weight_quantizer mapping and keep `enable:
true` as a sibling as well so the mapping contains `block_sizes`, `type`, and
`enable` at the same level consistent with other configs (e.g.,
w8a8_mxfp8_mxfp8.yml).
In `@modelopt_recipes/configs/ptq/w4_nvfp4_mlp_only.yml`:
- Around line 19-22: The YAML currently nests type and scale_bits inside the
block_sizes map; unindent the keys so that type and scale_bits are top-level
fields alongside block_sizes (i.e., align their indentation with block_sizes
rather than being children of it) to restore the expected quantizer schema; do
the same fix for the other identical block later in the file (the second
occurrence of block_sizes/type/scale_bits).
In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_nvfp4.yml`:
- Around line 18-23: The YAML has misplaced keys: move the keys "type: dynamic"
and "scale_bits: [4, 3]" out from under the "block_sizes" mapping so they are
siblings of "block_sizes" (same indentation level as "block_sizes", "num_bits",
and "enable"); do the same correction for the second quantizer block (the other
occurrence containing "block_sizes" and "num_bits") so both quantizer blocks
have "type" and "scale_bits" at the correct top-level within each block.
In `@modelopt_recipes/configs/ptq/w4a8_int4_blockwise_fp8_fp8.yml`:
- Around line 17-25: The YAML has `type: static` incorrectly nested under
`block_sizes` within the `*weight_quantizer` mapping; move `type: static` out of
the `block_sizes` mapping so it is a sibling property of `num_bits` and
`block_sizes` under `*weight_quantizer` (i.e., align `type` with `num_bits` and
`enable`), and verify both quantizer entries (the single `num_bits: 4` block
with `block_sizes` and the `num_bits: [4, 3]` block) follow the same correct
indentation/structure so the parser treats `type`, `block_sizes`, `num_bits`,
and `enable` as peers rather than nesting `type` inside `block_sizes`.
In `@modelopt_recipes/configs/ptq/w4a8_mxfp4_fp8.yml`:
- Around line 24-26: The YAML has an incomplete key under the '*input_quantizer'
mapping: the 'axis:' key is empty which causes parsing to produce null; update
the mapping anchored by '*input_quantizer' to provide a concrete value (for
example "axis: 0" or "axis: -1" or the intended integer/list), or remove the
'axis:' line entirely if axis is not required—ensure the chosen value matches
other quantizer definitions (search for '*input_quantizer' and any uses of
'axis' in related quantizer configs to pick the correct value).
- Around line 17-23: The YAML has `type: dynamic` and `scale_bits: [8, 0]`
incorrectly nested under the `block_sizes` mapping for the '*weight_quantizer'
section; update the '*weight_quantizer' config so that `block_sizes` only
contains the size mapping (e.g., `-1: 32`) and move `type: dynamic` and
`scale_bits: [8, 0]` out to be sibling keys alongside `num_bits`, `block_sizes`,
and `enable` (i.e., ensure `num_bits`, `block_sizes`, `type`, `scale_bits`, and
`enable` are all at the same indentation level under '*weight_quantizer').
In `@modelopt_recipes/configs/ptq/w6a6_mxfp6_mxfp6.yml`:
- Around line 17-30: The YAML for both "*weight_quantizer" and
"*input_quantizer" has "type: dynamic" and "scale_bits: [8, 0]" mistakenly
nested under "block_sizes"; move "type" and "scale_bits" out so they are sibling
keys of "block_sizes" (i.e., ensure "block_sizes" only contains the mapping for
-1: 32 and "type" and "scale_bits" sit at the same level as "num_bits",
"block_sizes", and "enable"); update both quantizer blocks ("*weight_quantizer"
and "*input_quantizer") accordingly.
In `@modelopt_recipes/configs/ptq/w8a8_mxfp8_mxfp8.yml`:
- Around line 19-23: The YAML nests type: dynamic and scale_bits: [8, 0] under
the block_sizes mapping for both *weight_quantizer and *input_quantizer, which
makes them keys inside block_sizes; move the type and scale_bits entries out one
indentation level so they are sibling properties of block_sizes (i.e., keep
block_sizes: { -1: 32 } and add type: dynamic and scale_bits: [8, 0] alongside
block_sizes) for both *weight_quantizer and *input_quantizer.
In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml`:
- Line 1: The file header is malformed because the `__base__` prefix was
accidentally prepended to the SPDX license comment; remove the `__base__` token
so the first line is a valid YAML comment (e.g., start with `#
SPDX-FileCopyrightText: ...`) to restore proper YAML syntax in kv_quant.yml and
ensure any base inclusion is handled via the intended YAML inheritance mechanism
rather than inline prefix.
In `@modelopt/torch/opt/config.py`:
- Around line 401-430: The current loader picks a config_path from
paths_to_check but then resolves entries in config_data["__base__"] by calling
load_config(base_cfg) directly, which lets bare base names resolve against the
caller CWD instead of the originating recipe library; update the base resolution
so that for each base_cfg you first resolve it relative to the chosen
config_path (or relative to BUILTIN_RECIPES_LIB when the matched path came from
that lib): when iterating bases = config_data.pop("__base__", []), if base_cfg
is a relative string/path (not absolute) join it with config_path.parent (or the
library path used to find config_path) and normalize suffixes (add .yml/.yaml
probing like the original search) before calling load_config on the fully
resolved Path; ensure you reference config_path, BUILTIN_RECIPES_LIB and
load_config in the change so built-in bases remain anchored to their recipe
library.
In `@modelopt/torch/recipe/config.py`:
- Around line 78-83: Replace the untyped dict kv_quant with a dedicated Pydantic
config model (e.g., class KVQuantConfig(BaseModel) or `@dataclass`) that mirrors
the fields/validation used by model_quant, then change the kv_quant field to use
that type (kv_quant: KVQuantConfig = ModeloptField(...)) and set a proper
default via default or default_factory (not a bare {}), keep
validate_default=True, and add the necessary import for the new KVQuantConfig so
schema validation and IDE hints are preserved for the KV cache quant settings.
In `@modelopt/torch/recipe/loader.py`:
- Around line 117-122: The code attempts to instantiate QuantizeQuantCfgType in
ModelOptPTQRecipe construction, but QuantizeQuantCfgType is a type alias (not
callable) so change the call site in the loader where ModelOptPTQRecipe(...) is
returned: stop using kv_quant=QuantizeQuantCfgType(**kv_quant_config) and
instead pass the raw dict (kv_quant=kv_quant_config) since
ModelOptPTQRecipe.kv_quant expects dict[str, Any]; keep
model_quant=QuantizeConfig(**model_quant_config) as-is.
In `@pyproject.toml`:
- Around line 121-122: The package-data section is missing an entry for the
modelopt_recipes package so its YAML recipe files aren’t included in built
distributions; update the [tool.setuptools.package-data] table to add a key for
modelopt_recipes that includes the recipe file patterns (e.g., "**/*.yml",
"**/*.yaml") so setuptools includes those files in wheels/sdists; ensure the new
entry mirrors the existing modelopt patterns but targets the modelopt_recipes
package name so recipe loading works after installation.
---
Outside diff comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 196-241: The function build_quant_cfg mutates the input quant_cfg
in multiple branches (moe_calib_experts_ratio, Gemma int8_sq, phi4mm), causing
callers to observe leaked changes; fix this by making a deep copy of quant_cfg
immediately on entry (e.g., quant_cfg = copy.deepcopy(quant_cfg)) and then
perform all modifications on that copy (so weight_quantizer,
quant_cfg["algorithm"], and quant_cfg["quant_cfg"] edits do not affect the
caller's object). Ensure the deepcopy happens before any reads/writes that may
alter nested structures in build_quant_cfg.
---
Minor comments:
In `@modelopt_recipes/configs/ptq/kv_fp8_affine.yml`:
- Around line 16-22: In the quant_cfg block for '*[kv]_bmm_quantizer' add the
missing sibling property "enable: true" (next to num_bits and bias) so the entry
reads the same structure as other configs (e.g., kv_fp8.yml and
kv_nvfp4_affine.yml); update the YAML under the '*[kv]_bmm_quantizer' key to
include enable: true directly beneath num_bits and alongside bias to enable the
quantizer.
In `@modelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml`:
- Around line 1-2: The file name contains a duplicated token ("_nvfp4_mlp_only")
— rename the config file from w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to
w4a4_nvfp4_mlp_only.yml and update any references to this file (e.g., in tests,
loaders, or CI) so they point to the new filename to keep naming consistent with
other configs.
In `@modelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.yml`:
- Around line 20-22: The YAML fragment under the "*input_quantizer" mapping
currently contains a blank "axis:" key which parses to null but is ambiguous;
update the mapping by either setting "axis: null" explicitly or removing the
"axis" key entirely so the schema default (None) is used—apply the same change
wherever you see the blank "axis:" pattern (e.g., in the w8a8_fp8_fp8.yml and
w4a8_mxfp4_fp8.yml configs) to keep the intent consistent with Python tests and
code.
In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.yml`:
- Line 18: The description string currently reads "FP8 2D blockwise weights, FP8
activation, FP8 KV Cache, max calibration." but the recipe is weight-only (see
weight_only and base model_quant.yml reference to
configs/ptq/w8_fp8_2d_blockwise); update the description to accurately reflect
that only weights are quantized (e.g., "FP8 2D blockwise weight-only
quantization, FP8 KV Cache, max calibration") and remove or reword any mention
of FP8 activation so catalog metadata matches the recipe name and base config.
In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml`:
- Line 18: Description string contains a typo "per tokenactivation"; update the
description value in recipe.yml (the description field) to read "FP8 per channel
weight, FP8 per token activation, FP8 KV Cache, max calibration." — locate the
description key in
modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.yml and
insert the missing space between "token" and "activation".
In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.yml`:
- Around line 16-19: Update the duplicated config reference: replace the base
entry "configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only" with the corrected name
"configs/ptq/w4a4_nvfp4_mlp_only" in the __base__ list inside model_quant.yml;
also ensure the actual file on disk is renamed from
w4a4_nvfp4_mlp_only_nvfp4_mlp_only.yml to w4a4_nvfp4_mlp_only.yml so the new
reference resolves correctly.
In `@modelopt_recipes/general/README`:
- Line 3: Fix the typo in the README line that currently reads "ptq/*:
model-agnostic general Post Traing Quantization recipes" by changing "Traing" to
"Training" so it reads "ptq/*: model-agnostic general Post Training Quantization
recipes"; locate the string containing "ptq/*" or "Post Traing Quantization" and
update it accordingly.
In `@modelopt/torch/opt/config.py`:
- Around line 426-430: The code reads YAML into config_data and then assumes
it's a mapping when building bases in load_config; validate that config_data is
a dict after yaml.safe_load(content) and raise a clear ValueError if not (e.g.,
for None, list, or scalar documents) so recipe load failures remain actionable;
update the logic in the function around config_data and the bases =
[load_config(...)] line to check isinstance(config_data, dict) and raise with a
message referencing config_path and the unexpected type.
In `@modelopt/torch/quantization/config.py`:
- Around line 845-853: The tuple_num_bits field validator currently uses assert
to check list length, which can be disabled; update the validator
(tuple_num_bits in modelopt/torch/quantization/config.py) to raise a ValueError
(or TypeError) with a clear message when num_bits is a list but len(num_bits) !=
2, then return the cast tuple as before; leave other behavior unchanged.
In `@modelopt/torch/recipe/config.py`:
- Around line 35-38: The docstrings in the public configuration classes in
modelopt/torch/recipe/config.py currently contain incorrect, copied text about
layer-rule behavior; update the docstrings to accurately describe each class's
purpose and public API (e.g., the base configuration class for model
optimization recipes) and remove the unrelated sentence about matching
"*output_layer*" and replacing attributes; ensure both occurrences (the base
config class and the other public config class in this file) describe their
configuration fields, defaults, and intended usage for generated docs and IDE
help.
In `@modelopt/torch/recipe/loader.py`:
- Around line 76-79: Replace the fragile assert-based checks for the incoming
recipe_descriptor with explicit validation that raises ValueError: check that
recipe_descriptor is a dict and if not raise ValueError("Recipe descriptor
should be a dictionary."); then check that "recipe_type" is present and if not
raise ValueError("Recipe descriptor should contain 'recipe_type' field.").
Update the validation in loader.py where recipe_descriptor is handled so the
function (the block containing the current assert lines) raises these
ValueErrors instead of using assert.
---
Nitpick comments:
In `@modelopt_recipes/configs/ptq/calib_algo_none.yml`:
- Line 16: The YAML key "algorithm" currently parses to null implicitly; make
the "no calibration" intent explicit by setting the algorithm field to an
explicit null sentinel (e.g., use the YAML null token) so readers and config
composition logic see the intended "no calibration" value for the algorithm key.
In `@modelopt_recipes/configs/ptq/kv_fp8.yml`:
- Around line 16-19: Add a short inline comment next to the num_bits entry in
the '*[kv]_bmm_quantizer' block explaining that num_bits: [4, 3] specifies the
FP8 format (4 exponent bits and 3 mantissa/significand bits, i.e., E4M3) and
note any relevant bias/representation detail or mapping used by your quantizer
implementation so future editors understand the encoding expected by kv_fp8.yml
and the quantizer code paths (e.g., references to E4M3 vs E3M4 conventions).
In `@modelopt_recipes/configs/README`:
- Around line 1-3: Update the README sentence "The recipes can refer to the
configs using path of: configs/..." to clearer wording; replace it with "Recipes
can reference shared configs via paths under configs/..." so the reference root
is explicit and reads naturally (edit the README line containing that original
phrase).
In `@modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.yml`:
- Line 18: Update the description string in recipe.yml to a clearer phrasing
that matches the folder name and intent: replace the current description value
with something like "NVFP4 MLP weight-only, no activation quantization, FP8 KV
cache, max calibration" so it reads cleanly and explicitly references
weight-only MLP, no activation quantization, FP8 KV cache, and max calibration.
In `@modelopt/torch/quantization/config.py`:
- Around line 149-168: The module currently performs a module-level assert
comparing OmegaConf.to_container(OmegaConf.create({"quant_cfg":
_default_disabled_quantizer_cfg}), resolve=True) to
load_config("configs/ptq/base"), which can break imports or be bypassed under
python -O; replace that assert with an explicit runtime check that raises a
clear RuntimeError if the configs mismatch (e.g., compute the two values using
OmegaConf.to_container and load_config and if they differ raise RuntimeError
with a descriptive message), or alternatively move this validation into a
dedicated unit test or a lazily-invoked validator function called on first use;
ensure you update the code around _default_disabled_quantizer_cfg,
OmegaConf.to_container/OmegaConf.create, and load_config(...) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 215f48fb-1b5f-4f50-a528-f88414905cc3
📒 Files selected for processing (118)
examples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pyexamples/llm_ptq/multinode_ptq.pymodelopt/torch/opt/config.pymodelopt/torch/quantization/config.pymodelopt/torch/recipe/__init__.pymodelopt/torch/recipe/config.pymodelopt/torch/recipe/loader.pymodelopt_recipes/__init__.pymodelopt_recipes/configs/READMEmodelopt_recipes/configs/ptq/base.ymlmodelopt_recipes/configs/ptq/calib_algo_awq_clip.ymlmodelopt_recipes/configs/ptq/calib_algo_awq_full.ymlmodelopt_recipes/configs/ptq/calib_algo_awq_lite.ymlmodelopt_recipes/configs/ptq/calib_algo_max.ymlmodelopt_recipes/configs/ptq/calib_algo_none.ymlmodelopt_recipes/configs/ptq/calib_algo_smoothquant.ymlmodelopt_recipes/configs/ptq/calib_algo_svdquant.ymlmodelopt_recipes/configs/ptq/kv_fp8.ymlmodelopt_recipes/configs/ptq/kv_fp8_affine.ymlmodelopt_recipes/configs/ptq/kv_nvfp4.ymlmodelopt_recipes/configs/ptq/kv_nvfp4_affine.ymlmodelopt_recipes/configs/ptq/kv_nvfp4_rotate.ymlmodelopt_recipes/configs/ptq/mha_fp8.ymlmodelopt_recipes/configs/ptq/w4_int4.ymlmodelopt_recipes/configs/ptq/w4_int4_blockwise.ymlmodelopt_recipes/configs/ptq/w4_mxfp4_mlp_only.ymlmodelopt_recipes/configs/ptq/w4_nvfp4_mlp_only.ymlmodelopt_recipes/configs/ptq/w4a4_mxfp4_mxfp4.ymlmodelopt_recipes/configs/ptq/w4a4_nvfp4_mlp_only_nvfp4_mlp_only.ymlmodelopt_recipes/configs/ptq/w4a4_nvfp4_nvfp4.ymlmodelopt_recipes/configs/ptq/w4a8_int4_blockwise_fp8_fp8.ymlmodelopt_recipes/configs/ptq/w4a8_mxfp4_fp8.ymlmodelopt_recipes/configs/ptq/w4a8_nvfp4_fp8.ymlmodelopt_recipes/configs/ptq/w6a6_mxfp6_mxfp6.ymlmodelopt_recipes/configs/ptq/w8_fp8_2d_blockwise.ymlmodelopt_recipes/configs/ptq/w8_int8.ymlmodelopt_recipes/configs/ptq/w8a8_fp8_fp8.ymlmodelopt_recipes/configs/ptq/w8a8_fp8_per_channel_fp8_per_token.ymlmodelopt_recipes/configs/ptq/w8a8_int8_per_channel_int8.ymlmodelopt_recipes/configs/ptq/w8a8_mxfp8_mxfp8.ymlmodelopt_recipes/configs/ptq/w8a8_mxint8_mxint8.ymlmodelopt_recipes/general/READMEmodelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/fp8_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/fp8_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/fp8_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/int4_awq-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/int4_awq-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/int4_awq-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/int8_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/int8_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/int8_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/int8_smoothquant-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/int8_weight_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/int8_weight_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/int8_weight_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mxfp4_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mxfp4_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mxfp4_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mxfp6_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mxfp6_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mxfp6_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mxfp8_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mxfp8_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mxfp8_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mxint8_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mxint8_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mxint8_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv/READMEmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv/recipe.ymlpyproject.tomltests/_test_utils/torch/quantization/quantize_common.py
modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1000 +/- ##
==========================================
+ Coverage 70.11% 70.31% +0.19%
==========================================
Files 221 225 +4
Lines 25459 25727 +268
==========================================
+ Hits 17851 18090 +239
- Misses 7608 7637 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 801-809: The tuple_num_bits validator currently uses an assert to
check list length which can be disabled and is inconsistent with other
validators; update tuple_num_bits (the `@field_validator` for "num_bits") to raise
a ValueError when a provided list does not have length 2, e.g. replace the
assert len(num_bits) == 2 with an explicit if check that raises
ValueError("num_bits list must have length 2") so invalid inputs are rejected
consistently.
- Around line 972-980: The return annotation for the validator
tuple_block_sizes_scale_bits is wrong: it declares -> int | tuple[int, int] but
the validator receives and returns the block_sizes dict (possibly mutated).
Update the type annotation to reflect that it returns a dict (e.g., Mapping[str,
Any] or dict[str, Any]) and keep the logic that converts v["scale_bits"] from
list to tuple when present; ensure the decorator `@field_validator`("block_sizes",
mode="before") and method name tuple_block_sizes_scale_bits remain unchanged so
the validator still runs for the block_sizes field.
- Around line 148-152: The _default_disabled_quantizer_cfg uses fully-qualified
keys ("torch.nn.BatchNorm1d", etc.) that don't match the registry keys; update
the dict keys in _default_disabled_quantizer_cfg to the registered identifiers
("nn.BatchNorm1d", "nn.BatchNorm2d", "nn.BatchNorm3d", "nn.LeakyReLU") so
set_quantizer_by_cfg() can correctly find entries in QuantModuleRegistry (see
quant_batchnorm.py and quant_activations.py) and properly disable BatchNorm and
LeakyReLU quantizers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 39b7a440-e093-46d3-bba8-f01e98f0aa9c
📒 Files selected for processing (1)
modelopt/torch/quantization/config.py
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)
852-861:⚠️ Potential issue | 🟡 MinorUse
ValueErrorinstead ofassertfor input validation.Line 857 uses
assert len(num_bits) == 2which is inconsistent with other validators in this class (e.g.,validate_num_bits,validate_bias) that raiseValueError. Additionally, assertions can be disabled withpython -O, allowing invalid inputs to pass silently.📝 Proposed fix
`@field_validator`("num_bits", mode="before") `@classmethod` def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]: """Convert num_bits to tuple if list.""" if isinstance(num_bits, list): - assert len(num_bits) == 2 + if len(num_bits) != 2: + raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}") return cast("tuple[int, int]", tuple(num_bits)) return num_bits🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 852 - 861, The tuple_num_bits field validator uses an assert to check list length; replace the assert with raising a ValueError when len(num_bits) != 2 so invalid inputs cannot slip through when Python assertions are disabled. In the tuple_num_bits method (decorated with `@field_validator`("num_bits", mode="before")), validate that a list has exactly two elements and raise ValueError("num_bits list must have length 2") (or similar) before casting and returning the tuple; keep the existing cast("tuple[int, int]", tuple(num_bits)) and return semantics unchanged.
1023-1031:⚠️ Potential issue | 🟡 MinorIncorrect return type annotation.
The return type
-> int | tuple[int, int]is incorrect. This validator operates onblock_sizeswhich is a dict, and returns the (potentially modified) dictv.📝 Proposed fix
`@field_validator`("block_sizes", mode="before") `@classmethod` - def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]: + def tuple_block_sizes_scale_bits( + cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None + ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None: """Convert block_sizes.scale_bits to tuple if list.""" if v and v.get("scale_bits"): scale_bits = v.get("scale_bits") if isinstance(scale_bits, list): v["scale_bits"] = tuple(scale_bits) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1023 - 1031, The validator tuple_block_sizes_scale_bits is annotated to return int | tuple[int, int] but it actually accepts and returns the block_sizes dict (possibly mutated); update its return type annotation to reflect a dict (e.g., dict[str, Any] or Mapping[str, Any]) so the `@field_validator` signature matches the actual behavior and avoid type-checker complaints.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)
1033-1050: Existing validators use assertions inconsistently.For future consideration: The
validate_block_sizesvalidator usesassertstatements (lines 1039, 1041, 1046, 1048) instead of raisingValueError, which is inconsistent with the pattern used invalidate_biasand other validators. This is existing code and not blocking, but worth noting for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1033 - 1050, The validator function validate_block_sizes uses Python assert statements for input validation which should be replaced with explicit exceptions for consistent runtime validation: locate validate_block_sizes (decorated with `@field_validator`) and replace each assert with raising ValueError (or TypeError where appropriate) including the same descriptive messages — e.g., check info.data["axis"] is None and raise ValueError("axis must be None when block_sizes is not None."), when dynamic type is detected use cls._get_block_quant_axes_and_sizes(v) and raise ValueError("Dynamic block quantization only supports quantization last axis.") if the length check fails, and for the key/value checks raise ValueError describing invalid keys or types (e.g., "invalid block_sizes key" or "block_sizes values must be int or None"). Ensure the validator still returns v on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml`:
- Around line 18-19: Update the YAML description to accurately reflect the
recipe variant: replace the current "Mamba MoE FP8 Conservative with FP8 KV
cache." text in the description field with a phrase matching the file name and
variant (e.g., "Mamba MoE NVFP4 Aggressive with FP8 KV cache.") so the metadata
aligns with mamba_moe_nvfp4_aggressive-fp8_kv; ensure the description string
mentions NVFP4 and Aggressive to avoid misleading catalog entries.
In `@modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.yml`:
- Line 1: Remove the stray "__base__" token that was prepended to the license
header in kv_quant.yml so the SPDX comment is a proper YAML comment (i.e.,
restore the header to start with the comment text only), leaving the existing
"__base__:" mapping later in the file unchanged; specifically, delete the
leading "__base__" at the top of the file so the license header is not treated
as a YAML scalar and does not conflict with the "__base__:" entry.
In `@modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml`:
- Around line 16-19: The __base__ list in model_quant.yml is missing the KV
preset; update the __base__ array to include the configs/ptq/kv_fp8.yml entry so
the FP8 KV cache quantization is actually enabled (add "configs/ptq/kv_fp8"
alongside the existing entries). Ensure the same change is applied to the
sibling recipe nvfp4_mlp_only-fp8_kv so both recipes include the kv_fp8 preset.
In
`@modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml`:
- Around line 16-18: The description string in recipe.yml claims "max
calibration" but the PTQ config pointed to by model_quant.yml uses the MSE
calibrator at configs/ptq/calib_algo_mse; update the description field to
accurately state MSE calibration (e.g., replace "max calibration" with "MSE
calibration" or similar) so the recipe description matches the actual
calibration algorithm referenced.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 852-861: The tuple_num_bits field validator uses an assert to
check list length; replace the assert with raising a ValueError when
len(num_bits) != 2 so invalid inputs cannot slip through when Python assertions
are disabled. In the tuple_num_bits method (decorated with
`@field_validator`("num_bits", mode="before")), validate that a list has exactly
two elements and raise ValueError("num_bits list must have length 2") (or
similar) before casting and returning the tuple; keep the existing
cast("tuple[int, int]", tuple(num_bits)) and return semantics unchanged.
- Around line 1023-1031: The validator tuple_block_sizes_scale_bits is annotated
to return int | tuple[int, int] but it actually accepts and returns the
block_sizes dict (possibly mutated); update its return type annotation to
reflect a dict (e.g., dict[str, Any] or Mapping[str, Any]) so the
`@field_validator` signature matches the actual behavior and avoid type-checker
complaints.
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1033-1050: The validator function validate_block_sizes uses Python
assert statements for input validation which should be replaced with explicit
exceptions for consistent runtime validation: locate validate_block_sizes
(decorated with `@field_validator`) and replace each assert with raising
ValueError (or TypeError where appropriate) including the same descriptive
messages — e.g., check info.data["axis"] is None and raise ValueError("axis must
be None when block_sizes is not None."), when dynamic type is detected use
cls._get_block_quant_axes_and_sizes(v) and raise ValueError("Dynamic block
quantization only supports quantization last axis.") if the length check fails,
and for the key/value checks raise ValueError describing invalid keys or types
(e.g., "invalid block_sizes key" or "block_sizes values must be int or None").
Ensure the validator still returns v on success.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 09898683-6983-46be-8644-cb6cacc8d85a
📒 Files selected for processing (25)
modelopt/torch/quantization/config.pymodelopt_recipes/configs/ptq/base_mamba.ymlmodelopt_recipes/configs/ptq/calib_algo_local_hessian.ymlmodelopt_recipes/configs/ptq/calib_algo_mse.ymlmodelopt_recipes/configs/ptq/w4a4_nvfp4_static_nvfp4.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/READMEmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/READMEmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
✅ Files skipped from review due to trivial changes (9)
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
- modelopt_recipes/configs/ptq/calib_algo_local_hessian.yml
- modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.yml
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README
modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml
Outdated
Show resolved
Hide resolved
modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.yml
Outdated
Show resolved
Hide resolved
modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml
Outdated
Show resolved
Hide resolved
modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
Outdated
Show resolved
Hide resolved
cjluo-nv
left a comment
There was a problem hiding this comment.
Review: No-Behavior-Change Correctness
I found one real bug in the new recipe path, plus a few other discrepancies worth flagging.
🐛 Bug: Recipe KV quantization is broken (hf_ptq.py)
In the new recipe branch:
quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
quant_cfg,
recipe.kv_quant, # ← wrong
)recipe.kv_quant is the full config dict loaded from kv_quant.yml (e.g. {"quant_cfg": {"*[kv]_bmm_quantizer": {...}}}).
But update_quant_cfg_with_kv_cache_quant expects the quantizer patterns sub-dict as the second argument (not the full config wrapper). It does:
quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)So with the current code, this updates the patterns dict with {"quant_cfg": {"*[kv]_bmm_quantizer": ...}} — adding a nested "quant_cfg" key into the patterns level. KV quantization is silently not applied.
The old non-recipe path correctly passes FP8_KV_CFG["quant_cfg"] (the sub-dict). The fix should be:
quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
quant_cfg,
recipe.kv_quant["quant_cfg"], # ← take the sub-dict
)Or redesign recipe.kv_quant to store just the patterns directly (and update the YAML/loader accordingly).
⚠️ Behavior change: NVFP4_FP8_MHA_CONFIG (modelopt/torch/quantization/config.py)
The old config had only "default": {"enable": False} in quant_cfg. The new version replaces that with **_default_disabled_quantizer_cfg, which includes many additional explicit disable patterns (*lm_head*, *block_sparse_moe.gate*, *router*, etc.).
The assert catches that the YAML matches the updated _DEPRECATED dict, so this is consistent — but it IS a semantic change in which patterns are disabled. This should be called out explicitly rather than claimed as "no behavior change."
⚠️ Intentional behavior change: auto_quantize loop variable bug fix (hf_ptq.py)
# Old (buggy):
assert all(
args.qformat in [...]
for args.qformat in qformat_list # mutates args.qformat as side effect!
)
# New (correct):
assert all(
qformat in [...]
for qformat in qformat_list
)This is a legitimate bug fix, but it IS a behavior change: the old code had args.qformat set to the last element of qformat_list after the all() call, which could affect downstream code. Worth acknowledging in the PR description.
💬 Minor: kv_quant.yml is required even for recipes that don't do KV quantization (modelopt/torch/recipe/loader.py)
_load_ptq_recipe raises ValueError if no kv_quant.yml is found. But a recipe might legitimately skip KV quantization. Consider making it optional and defaulting recipe.kv_quant to {}.
✅ Things that look correct
tuple→listconversion fornum_bitsandscale_bitsin YAML, with thetuple_num_bits/tuple_block_sizes_scale_bitsvalidators inQuantizerAttributeConfigto normalize them back — clean approach.- The
__base__YAML inheritance mechanism via OmegaConf merge is well-structured. - The assert-based equivalence checks for all the
_DEPRECATEDconfigs are a good regression guard. - Non-recipe paths in
hf_ptq.pyandmultinode_ptq.pylook functionally equivalent to before.
cjluo-nv
left a comment
There was a problem hiding this comment.
Review: No-Behavior-Change Correctness
I found one real bug in the new recipe path, plus a few other discrepancies.
Bug: Recipe KV quantization is broken (hf_ptq.py)
In the new recipe branch:
quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
quant_cfg,
recipe.kv_quant, # wrong
)recipe.kv_quant is the full config dict loaded from kv_quant.yml — e.g. {"quant_cfg": {"*[kv]_bmm_quantizer": {...}}}.
But update_quant_cfg_with_kv_cache_quant expects the quantizer patterns sub-dict as its second argument. It does:
quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)So the current code ends up inserting a nested "quant_cfg" key into the patterns dict — KV quantization is silently not applied.
The old non-recipe path correctly passes FP8_KV_CFG["quant_cfg"]. Fix should be:
quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant(
quant_cfg,
recipe.kv_quant["quant_cfg"], # take the sub-dict
)Behavior change: NVFP4_FP8_MHA_CONFIG (modelopt/torch/quantization/config.py)
The old config had only "default": {"enable": False} in quant_cfg. The new version replaces that with **_default_disabled_quantizer_cfg, which includes many additional explicit disable patterns (*lm_head*, *block_sparse_moe.gate*, *router*, etc.). This is a semantic change even though the assert passes (the assert was also updated to match).
Behavior change / bug fix: auto_quantize loop variable (hf_ptq.py)
# Old (buggy): mutates args.qformat as a side effect
assert all(args.qformat in [...] for args.qformat in qformat_list)
# New (correct):
assert all(qformat in [...] for qformat in qformat_list)This is a legitimate fix but IS a behavior change — the old code left args.qformat set to the last element of qformat_list after the assertion.
Minor: kv_quant.yml required even for recipes without KV quantization (modelopt/torch/recipe/loader.py)
_load_ptq_recipe raises ValueError if kv_quant.yml is missing. A recipe that skips KV quantization would fail to load. Consider making it optional.
Things that look correct
- tuple→list conversion for num_bits/scale_bits in YAML, with validators to normalize back — clean.
- The base YAML inheritance via OmegaConf merge is well-structured.
- The assert-based equivalence checks for all _DEPRECATED configs are a solid regression guard.
- Non-recipe paths in hf_ptq.py and multinode_ptq.py look functionally equivalent to before.
cjluo-nv
left a comment
There was a problem hiding this comment.
test comment from Claude Code review
1a68373 to
1b4216b
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/torch/quantization/config.py (1)
1541-1565:⚠️ Potential issue | 🟡 MinorRemove debug print statements.
Lines 1561 and 1564 contain debug
Proposed fix
# quantization like W4A8 has a list of weight quantizers if isinstance(cfg, list): for _config in cfg: if _not_dynamic(_config): - print(f"{cfg}: True") return True elif _not_dynamic(cfg): - print(f"{cfg}: True") return True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1541 - 1565, In need_calibration(), remove the two debug print statements that emit cfg to stdout (the prints inside the loop for list items and the single-cfg branch) so the function does not produce unexpected console output; either delete those print calls or replace them with a proper logger.debug call using the module logger (e.g., logging.getLogger(__name__).debug) if you need the diagnostics. Ensure the changes are made inside the need_calibration function and do not alter the _not_dynamic helper logic or return behavior.
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)
1027-1035:⚠️ Potential issue | 🟡 MinorIncorrect return type annotation.
The return type
-> int | tuple[int, int]is incorrect. This validator operates onblock_sizeswhich is a dict, and returns the (potentially modified) dictv.Proposed fix
`@field_validator`("block_sizes", mode="before") `@classmethod` - def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]: + def tuple_block_sizes_scale_bits( + cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None + ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None: """Convert block_sizes.scale_bits to tuple if list.""" if v and v.get("scale_bits"): scale_bits = v.get("scale_bits") if isinstance(scale_bits, list): v["scale_bits"] = tuple(scale_bits) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1027 - 1035, The validator tuple_block_sizes_scale_bits is annotated to return int | tuple[int, int] but it actually receives and returns the block_sizes dict (possibly modified); update the return type annotation to reflect a mapping/dict (e.g., dict[str, Any] or the appropriate BlockSizes type) and adjust imports if needed, keeping the function name tuple_block_sizes_scale_bits, the field_validator("block_sizes") decorator, and the scale_bits key handling unchanged.
856-864:⚠️ Potential issue | 🟡 MinorUse
ValueErrorinstead ofassertfor input validation.Line 861 uses
assert len(num_bits) == 2which is inconsistent with other validators in this class (e.g.,validate_num_bits,validate_bias) that raiseValueError. Assertions can be disabled withpython -O, allowing invalid inputs to pass silently.Proposed fix
`@field_validator`("num_bits", mode="before") `@classmethod` def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]: """Convert num_bits to tuple if list.""" if isinstance(num_bits, list): - assert len(num_bits) == 2 + if len(num_bits) != 2: + raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}") return cast("tuple[int, int]", tuple(num_bits)) return num_bits🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 856 - 864, The tuple_num_bits field validator uses an assert to check list length which can be skipped under -O; replace the assert in tuple_num_bits with an explicit check that raises ValueError (with a clear message like "num_bits list must have length 2") when len(num_bits) != 2, then return the casted tuple as before so behavior matches other validators (e.g., validate_num_bits, validate_bias).
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)
815-815: Type annotation includes pre-validation types.The
list[int]type in the annotation represents valid input that gets converted totuple[int, int]by thetuple_num_bitsvalidator. This is acceptable but could be clarified with a comment noting thatlist[int]inputs are converted to tuples during validation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` at line 815, The type annotation for the num_bits field currently includes list[int], which is accepted then converted to tuple[int,int] by the tuple_num_bits validator; update the declaration for num_bits (the ModeloptField definition) to keep the existing union types but add a one-line comment explaining that list[int] inputs are normalized to tuple[int,int] by the tuple_num_bits validator so readers understand the pre-validation shape conversion.
166-178: Consider explicit checks for config parity validation.The
assertstatements for config parity (e.g., lines 166, 178) will be disabled when Python runs with-Oflag. If these checks are critical for production correctness, consider using explicitif ... raise RuntimeError(...)instead ofassert.However, if these are purely development-time sanity checks that will be caught in CI, the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 166 - 178, Replace the two assert-based parity checks with explicit runtime checks that raise on mismatch: instead of using assert _default_disabled_quantizer_cfg_deprecated == _default_disabled_quantizer_cfg and assert _mamba_moe_disabled_quantizer_cfg_deprecated == _mamba_moe_disabled_quantizer_cfg, implement explicit comparisons and raise a RuntimeError (or ValueError) with a clear message referencing the mismatched symbols (_default_disabled_quantizer_cfg_deprecated, _default_disabled_quantizer_cfg, _mamba_moe_disabled_quantizer_cfg_deprecated, _mamba_moe_disabled_quantizer_cfg) so these critical config-parity validations remain active even when Python is run with -O.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1541-1565: In need_calibration(), remove the two debug print
statements that emit cfg to stdout (the prints inside the loop for list items
and the single-cfg branch) so the function does not produce unexpected console
output; either delete those print calls or replace them with a proper
logger.debug call using the module logger (e.g.,
logging.getLogger(__name__).debug) if you need the diagnostics. Ensure the
changes are made inside the need_calibration function and do not alter the
_not_dynamic helper logic or return behavior.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1027-1035: The validator tuple_block_sizes_scale_bits is annotated
to return int | tuple[int, int] but it actually receives and returns the
block_sizes dict (possibly modified); update the return type annotation to
reflect a mapping/dict (e.g., dict[str, Any] or the appropriate BlockSizes type)
and adjust imports if needed, keeping the function name
tuple_block_sizes_scale_bits, the field_validator("block_sizes") decorator, and
the scale_bits key handling unchanged.
- Around line 856-864: The tuple_num_bits field validator uses an assert to
check list length which can be skipped under -O; replace the assert in
tuple_num_bits with an explicit check that raises ValueError (with a clear
message like "num_bits list must have length 2") when len(num_bits) != 2, then
return the casted tuple as before so behavior matches other validators (e.g.,
validate_num_bits, validate_bias).
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Line 815: The type annotation for the num_bits field currently includes
list[int], which is accepted then converted to tuple[int,int] by the
tuple_num_bits validator; update the declaration for num_bits (the ModeloptField
definition) to keep the existing union types but add a one-line comment
explaining that list[int] inputs are normalized to tuple[int,int] by the
tuple_num_bits validator so readers understand the pre-validation shape
conversion.
- Around line 166-178: Replace the two assert-based parity checks with explicit
runtime checks that raise on mismatch: instead of using assert
_default_disabled_quantizer_cfg_deprecated == _default_disabled_quantizer_cfg
and assert _mamba_moe_disabled_quantizer_cfg_deprecated ==
_mamba_moe_disabled_quantizer_cfg, implement explicit comparisons and raise a
RuntimeError (or ValueError) with a clear message referencing the mismatched
symbols (_default_disabled_quantizer_cfg_deprecated,
_default_disabled_quantizer_cfg, _mamba_moe_disabled_quantizer_cfg_deprecated,
_mamba_moe_disabled_quantizer_cfg) so these critical config-parity validations
remain active even when Python is run with -O.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1e11ee75-9da2-4cd9-bc6b-b57c06b19bc5
📒 Files selected for processing (29)
modelopt/torch/quantization/config.pymodelopt_recipes/configs/ptq/base_mamba.ymlmodelopt_recipes/configs/ptq/calib_algo_local_hessian.ymlmodelopt_recipes/configs/ptq/calib_algo_mse.ymlmodelopt_recipes/configs/ptq/w4a4_nvfp4_static_nvfp4.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/READMEmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/READMEmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/kv_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
✅ Files skipped from review due to trivial changes (3)
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/README
- modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv/kv_quant.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/recipe.yml
🚧 Files skipped from review as they are similar to previous changes (12)
- modelopt_recipes/configs/ptq/calib_algo_mse.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/recipe.yml
- modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/kv_quant.yml
- modelopt_recipes/configs/ptq/base_mamba.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/README
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/recipe.yml
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv/kv_quant.yml
- modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv/model_quant.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv/model_quant.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv/model_quant.yml
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv/kv_quant.yml
- modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv/recipe.yml
| # limitations under the License. | ||
|
|
||
| __base__: | ||
| - configs/ptq/base |
There was a problem hiding this comment.
It seems it will be more straight forward to directly include the config here.
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| recipe_type: ptq |
There was a problem hiding this comment.
It is not clear to me what this recipe.yml is supposed to do. I recalled that you mentioned calibration dataset here.
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)
1031-1039:⚠️ Potential issue | 🟡 MinorIncorrect return type annotation.
The return type
-> int | tuple[int, int]is incorrect. This validator operates onblock_sizeswhich is a dict, and returns the (potentially modified) dictv.Proposed fix
`@field_validator`("block_sizes", mode="before") `@classmethod` - def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]: + def tuple_block_sizes_scale_bits( + cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None + ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None: """Convert block_sizes.scale_bits to tuple if list.""" if v and v.get("scale_bits"): scale_bits = v.get("scale_bits") if isinstance(scale_bits, list): v["scale_bits"] = tuple(scale_bits) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1031 - 1039, The validator tuple_block_sizes_scale_bits annotated as returning "int | tuple[int, int]" is wrong because it receives and returns the block_sizes dict; change the return type to reflect that (e.g., "dict[str, Any] | None" or simply "dict | None") and import Any if needed; keep the function signature on the `@field_validator`("block_sizes", mode="before") classmethod and ensure the implementation still returns the possibly-modified dict v.
860-868:⚠️ Potential issue | 🟡 MinorUse
ValueErrorinstead ofassertfor input validation.Line 865 uses
assert len(num_bits) == 2which can be disabled withpython -O, allowing invalid inputs to pass silently. This is inconsistent with other validators in this class (e.g.,validate_num_bits,validate_bias) that raiseValueError.Proposed fix
`@field_validator`("num_bits", mode="before") `@classmethod` def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]: """Convert num_bits to tuple if list.""" if isinstance(num_bits, list): - assert len(num_bits) == 2 + if len(num_bits) != 2: + raise ValueError(f"num_bits list must have exactly 2 elements, got {len(num_bits)}") return cast("tuple[int, int]", tuple(num_bits)) return num_bits🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 860 - 868, In tuple_num_bits (the `@field_validator` method) replace the assert with explicit input validation that raises ValueError when the incoming list does not have length 2: check isinstance(num_bits, list), if so verify len(num_bits) == 2 and if not raise ValueError("num_bits list must have length 2"), then return tuple(num_bits) (cast as needed); keep returning num_bits for non-list inputs so the method behavior matches other validators like validate_num_bits and validate_bias.
🧹 Nitpick comments (1)
modelopt/torch/opt/config.py (1)
450-473: Base config resolution order allows local files to shadow built-in recipes.The path search order checks local filesystem paths before
BUILTIN_RECIPES_LIB. This means a user's local file atconfigs/ptq/base.ymlcould shadow the built-in recipe. While this may be intentional to support user overrides, it could cause unexpected behavior if unrelated local files accidentally match base config names.Consider documenting this precedence behavior or adding a warning when a local file shadows a built-in recipe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/config.py` around lines 450 - 473, The current resolution builds paths_to_check and checks them in an order that prefers local filesystem files over BUILTIN_RECIPES_LIB, which can let a local config_file shadow a built-in recipe; update the logic in the path-building/lookup block (the code that constructs paths_to_check and the subsequent loop that sets config_path) to detect when both a local candidate (Path or Traversable) and a corresponding BUILTIN_RECIPES_LIB path exist and either (a) prefer the builtin by changing the search order, or (b) keep the current precedence but log or raise a warning when a local file shadows BUILTIN_RECIPES_LIB; implement the chosen behavior using the existing variables config_file, paths_to_check, BUILTIN_RECIPES_LIB and config_path so the change is localized and easy to review.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 897-899: The assertion message is misleading because the
isinstance check verifies the recipe's type, not whether load_recipe succeeded;
replace the assertion or message to reflect a type mismatch (e.g., raise a
TypeError or assert with "Loaded recipe is not a ModelOptPTQRecipe" or similar)
for the variable recipe and class ModelOptPTQRecipe so the failure clearly
indicates an unexpected recipe type rather than a loading error.
- Around line 908-911: The call passes the whole recipe.kv_quant wrapper to
mtq.update_quant_cfg_with_kv_cache_quant which expects the inner quantizer dict;
change the recipe path to pass recipe.kv_quant["quant_cfg"] (same extraction
used in the non-recipe path) so mtq.update_quant_cfg_with_kv_cache_quant
receives the inner dict and can .update(quant_cfg["quant_cfg"]) correctly;
update the call site that currently uses recipe.kv_quant to use
recipe.kv_quant["quant_cfg"] and keep the variable names quant_cfg and
kv_cache_quant_cfg consistent with the existing
update_quant_cfg_with_kv_cache_quant signature.
In `@modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml`:
- Around line 18-19: The YAML description currently mentions "FP8 activation"
which conflicts with the filename "fp8_2d_blockwise_weight_only-fp8_kv.yml" and
the base config "w8_fp8_2d_blockwise" (weight-only); update the description
field to remove any reference to activation quantization and clearly state this
is weight-only FP8 (with FP8 KV cache and max calibration) so it matches the
filename and the w8 base config.
In `@modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml`:
- Around line 18-19: The description string currently reads "FP8 per channel
weight, FP8 per tokenactivation, FP8 KV Cache, max calibration." and contains a
missing space; update the description value in the YAML (the description field)
to "FP8 per channel weight, FP8 per token activation, FP8 KV Cache, max
calibration." so "tokenactivation" is corrected to "token activation".
In `@modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml`:
- Around line 18-19: The YAML description incorrectly claims "Int4 activation"
while the referenced base config `w4_int4` has activation quantization disabled
(`*input_quantizer: enable: false`); update the top-level `description` string
(near the existing "Int4 weight, Int4 activation, FP8 KV Cache, AWQ
calibration.") to remove or correct the activation quantization claim (for
example: "Int4 weights, FP8 KV Cache, AWQ calibration" or similar), ensuring the
text accurately reflects that only weights are int4 and activations are not
quantized.
In `@modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml`:
- Around line 18-25: The recipe description claims "no calibration" but the
__base__ model_quant list includes configs/ptq/calib_algo_max, causing a
mismatch; fix by making them consistent: either replace the inherited preset
configs/ptq/calib_algo_max with configs/ptq/calib_algo_none in the
__base__.model_quant list (to keep "no calibration") or update the description
text to state "max calibration" (if max calibration is intended); adjust the
description or the third entry accordingly so description and the model_quant
inheritance match.
In `@modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml`:
- Around line 18-19: The YAML description text in
nvfp4_svdquant_default-fp8_kv.yml incorrectly says "FP8 activation" while the
base config referenced (w4a4_nvfp4_nvfp4) uses NVFP4 activations; update the
description field to state "NVFP4 activation" (or otherwise match the base
config wording) so the description accurately reflects the activation format
used by w4a4_nvfp4_nvfp4.
In `@modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml`:
- Around line 18-20: The description string currently says "fp8 scale sweep
calibration" but the recipe name nvfp4_w4a4_weight_local_hessian-fp8_kv and the
base configs/ptq/calib_algo_local_hessian indicate it uses the local-Hessian
calibration; update the description to accurately mention "local Hessian
calibration" (or "local-Hessian fp8 scale calibration") and remove or replace
"fp8 scale sweep calibration" so the metadata matches the selected calibration
algorithm.
In `@modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml`:
- Around line 18-19: The description incorrectly states "FP8 weight" while this
recipe (w4a8_awq_beta-fp8_kv.yml) and its base config
(configs/ptq/w4a8_int4_blockwise_fp8_fp8) use Int4 blockwise weights; update the
description to reflect Int4 weights (e.g., mention "Int4 blockwise weights then
FP8 activation and FP8 KV cache") so the metadata matches the recipe name and
base config.
In `@modelopt/torch/recipe/loader.py`:
- Around line 32-45: The function load_recipe currently treats Path/Traversable
differently than str; normalize recipe_path at start (e.g., if
isinstance(recipe_path, (Path, Traversable)) convert to str or extract a Path
object) and then run the same built-in lookup and .yml/.yaml probing used for
str inputs, only skipping the built-in library resolution when the original
input is an absolute filesystem path (use Path.is_absolute() or os.path.isabs to
detect); apply the same normalization to the branch handling the
legacy-directory flow so both code paths share the same resolution logic and
behavior.
- Around line 84-94: The code assumes load_config(recipe_file) returns a dict
and uses assert for validation; change it to first check isinstance(data, dict)
and raise ValueError if not, then replace the assert recipe_type is not None
with raising ValueError describing the missing metadata.recipe_type, and for the
RecipeType.PTQ branch replace assert checks for "model_quant" and "kv_quant"
with explicit membership checks that raise ValueError with clear messages; apply
the same pattern to where recipe_descriptor is loaded (lines referencing
recipe_descriptor) — verify it is a dict before using .get and replace asserts
with ValueError exceptions.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1031-1039: The validator tuple_block_sizes_scale_bits annotated as
returning "int | tuple[int, int]" is wrong because it receives and returns the
block_sizes dict; change the return type to reflect that (e.g., "dict[str, Any]
| None" or simply "dict | None") and import Any if needed; keep the function
signature on the `@field_validator`("block_sizes", mode="before") classmethod and
ensure the implementation still returns the possibly-modified dict v.
- Around line 860-868: In tuple_num_bits (the `@field_validator` method) replace
the assert with explicit input validation that raises ValueError when the
incoming list does not have length 2: check isinstance(num_bits, list), if so
verify len(num_bits) == 2 and if not raise ValueError("num_bits list must have
length 2"), then return tuple(num_bits) (cast as needed); keep returning
num_bits for non-list inputs so the method behavior matches other validators
like validate_num_bits and validate_bias.
---
Nitpick comments:
In `@modelopt/torch/opt/config.py`:
- Around line 450-473: The current resolution builds paths_to_check and checks
them in an order that prefers local filesystem files over BUILTIN_RECIPES_LIB,
which can let a local config_file shadow a built-in recipe; update the logic in
the path-building/lookup block (the code that constructs paths_to_check and the
subsequent loop that sets config_path) to detect when both a local candidate
(Path or Traversable) and a corresponding BUILTIN_RECIPES_LIB path exist and
either (a) prefer the builtin by changing the search order, or (b) keep the
current precedence but log or raise a warning when a local file shadows
BUILTIN_RECIPES_LIB; implement the chosen behavior using the existing variables
config_file, paths_to_check, BUILTIN_RECIPES_LIB and config_path so the change
is localized and easy to review.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f1f85b16-6aae-4bc1-ab7f-5b3bd8d0ff00
📒 Files selected for processing (35)
examples/llm_ptq/hf_ptq.pymodelopt/torch/opt/config.pymodelopt/torch/quantization/config.pymodelopt/torch/recipe/loader.pymodelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/fp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.ymlmodelopt_recipes/general/ptq/int4_awq-fp8_kv.ymlmodelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp6_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxint8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.yml
✅ Files skipped from review due to trivial changes (1)
- modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml
modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml
Outdated
Show resolved
Hide resolved
modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml
Outdated
Show resolved
Hide resolved
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/torch/quantization/config.py (1)
768-804:⚠️ Potential issue | 🟠 MajorAdd
MXFP6_DEFAULT_CFGtochoices.The module now exports
MXFP6_DEFAULT_CFGat Lines 372-389, but it is missing from this whitelist. Any path that validates preset names againstchoiceswill reject the new config even though it is publicly defined.🛠️ Suggested fix
"MXFP4_DEFAULT_CFG", + "MXFP6_DEFAULT_CFG", "MXFP8_DEFAULT_CFG",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 768 - 804, The choices set in config.py is missing the newly exported preset MXFP6_DEFAULT_CFG; update the choices variable to include "MXFP6_DEFAULT_CFG" so validation against choices accepts that preset (locate the choices declaration near the other preset names and add the exact string "MXFP6_DEFAULT_CFG" alongside the other entries).
♻️ Duplicate comments (6)
examples/llm_ptq/hf_ptq.py (1)
900-910:⚠️ Potential issue | 🟠 MajorRecipe mode is still coupled to the legacy
qformatflow.The inline comment on Line 901 is not accurate:
build_quant_cfg()inexamples/llm_ptq/example_utils.pystill keys some adjustments off theqformatstring, and the laterif args.qformat in QUANT_CFG_CHOICEScheck still decides whether quantization runs at all. With--recipe, that means the loaded recipe is no longer the source of truth—AWQ/SmoothQuant recipes can miss the existing qformat-specific handling, and a conflicting--qformatcan skip quantization entirely. Please derive one effective format/mode from the recipe and use it consistently, or make--recipemutually exclusive with the qformat-driven flags.Also applies to: 954-965
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/hf_ptq.py` around lines 900 - 910, The recipe-driven quant path is still influenced by args.qformat and QUANT_CFG_CHOICES; update the logic so the effective quant format is derived from the recipe (or enforce mutual exclusivity) before calling build_quant_cfg and before the QUANT_CFG_CHOICES check: either (A) extract a qformat/mode from recipe (e.g., recipe.qformat or recipe.type), set a local effective_format and pass that into build_quant_cfg and the subsequent args.qformat checks, or (B) make --recipe mutually exclusive with qformat-driven flags at CLI parse time and skip any args.qformat checks when recipe is provided; apply this change around the build_quant_cfg(...) call and the later if args.qformat in QUANT_CFG_CHOICES block and ensure mtq.update_quant_cfg_with_kv_cache_quant uses the same derived/validated format.modelopt/torch/recipe/loader.py (1)
32-74:⚠️ Potential issue | 🟠 MajorRelative
Pathrecipe inputs still bypass built-in discovery.Line 65 sends every non-
strinput straight toresolved, soload_recipe(Path("general/ptq/fp8_default-fp8_kv"))never gets.yml/.yamlprobing ormodelopt_recipeslookup. Keep relativePathinputs on the same resolver as strings and only skip built-in lookup for absolute filesystem paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/recipe/loader.py` around lines 32 - 74, The function load_recipe treats non-str inputs (Path/Traversable) as already resolved which skips the BUILTIN_RECIPES_LIB lookup and suffix probing; change the resolver logic so that when recipe_path is a Path that is relative (not absolute) it is processed the same way as str inputs: attempt BUILTIN_RECIPES_LIB.joinpath(recipe_path + suffix) and probe suffixes [""|".yml"|".yaml"], falling back to filesystem candidates (Path(recipe_path + suffix)); only bypass built-in lookup when the Path is absolute. Keep returning via _load_recipe_from_file and _load_recipe_from_dir as before.modelopt/torch/quantization/config.py (2)
860-868:⚠️ Potential issue | 🟡 MinorUse
ValueErrorinstead ofassertintuple_num_bits.The guard on Line 865 disappears under
python -O, which means custom-backend configs can accept arbitrary-length tuples coming from YAML lists. Raise aValueErrorhere instead of relying onassert.🛠️ Suggested fix
def tuple_num_bits(cls, num_bits: int | list[int] | tuple[int, int]) -> int | tuple[int, int]: """Convert num_bits to tuple if list.""" if isinstance(num_bits, list): - assert len(num_bits) == 2 + if len(num_bits) != 2: + raise ValueError(f"num_bits list must have length 2, got {len(num_bits)}") return cast("tuple[int, int]", tuple(num_bits)) return num_bits🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 860 - 868, The tuple_num_bits field validator currently uses assert to check list length which is ignored with python -O; change the check in tuple_num_bits (the `@field_validator` classmethod) to explicitly raise a ValueError when num_bits is a list but len(num_bits) != 2, providing a clear error message (e.g. "num_bits list must have length 2") before casting to tuple[int, int]; keep the rest of the logic intact so valid lists are converted to tuples and non-list inputs are returned unchanged.
1031-1039:⚠️ Potential issue | 🟡 MinorGuard raw
block_sizesinputs before calling.get().This
mode="before"validator runs before Pydantic has enforced the field type. A truthy non-dict input like[1, 2]will hitv.get(...)on Line 1035 and raiseAttributeErrorinstead of a validation error. The return annotation should also match the dict/Noneshape.🛠️ Suggested fix
- def tuple_block_sizes_scale_bits(cls, v) -> int | tuple[int, int]: + def tuple_block_sizes_scale_bits( + cls, v: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None + ) -> dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None: """Convert block_sizes.scale_bits to tuple if list.""" - if v and v.get("scale_bits"): + if isinstance(v, dict) and v.get("scale_bits"): scale_bits = v.get("scale_bits") if isinstance(scale_bits, list): v["scale_bits"] = tuple(scale_bits) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1031 - 1039, The validator tuple_block_sizes_scale_bits currently assumes v is a dict and calls v.get(...), which will raise AttributeError for non-dict raw inputs (e.g., lists) because this runs mode="before"; update the function to first guard v with a type check (e.g., if not isinstance(v, dict) or v is None: return v) and only then access v.get("scale_bits") and convert list -> tuple; also change the return type annotation from int | tuple[int,int] to a shape matching the validator (e.g., dict | None) so the signature reflects that the function returns the (possibly mutated) dict or None, and keep references to block_sizes and scale_bits in the implementation of tuple_block_sizes_scale_bits.modelopt/torch/opt/config.py (2)
395-407:⚠️ Potential issue | 🟠 MajorAnchor
__base__lookups to the loaded file.Line 397 and Line 462 still pass relative bases back through
load_config()without the currentconfig_path, so built-in bases can be shadowed by CWD files and external recipe trees cannot resolve their own relative bases. Resolve each relativebase_cfgagainst the matched file's parent (or the built-in library root) before recursing.Also applies to: 440-472, 475-510
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/config.py` around lines 395 - 407, The current _merge_base_list and related code call load_config(base_cfg) with raw relative paths, which lets CWD files shadow built-ins and prevents recipe-relative resolution; update the logic (in _merge_base_list and the code paths handling __base__ at lines around where base_cfg is iterated) to resolve each relative base_cfg against the parent directory of the file that matched the config (or against the built-in library root for builtin identifiers) before passing it to load_config: if base_cfg is relative, join it with the matched config's parent path (or library root) and normalize the path, then call load_config(resolved_path) so anchors are properly scoped to the loaded file. Ensure the same resolution is applied wherever base_cfg entries are recursed (the other functions handling __base__ merging).
509-511:⚠️ Potential issue | 🟠 MajorDon't coerce falsey non-mapping YAML into
{}.Line 509 uses
yaml.safe_load(...) or {}. That turnsfalse,0,[], or""into an empty config, while non-empty scalars/lists still blow up later in_resolve_bases().load_config()promises a mapping, so onlyNoneshould become{}; other top-level types should raiseValueError.🛠️ Suggested fix
- config_data = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {} - - return _resolve_bases(config_data) + loaded = yaml.safe_load(config_path.read_text(encoding="utf-8")) + if loaded is None: + config_data: dict[str, Any] = {} + elif not isinstance(loaded, dict): + raise ValueError(f"Config file {config_path} must contain a mapping at the top level.") + else: + config_data = loaded + + return _resolve_bases(config_data)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/config.py` around lines 509 - 511, The current use of yaml.safe_load(... ) or {} coerces falsey non-mapping values into an empty config; instead, after calling yaml.safe_load(config_path.read_text(...)) capture the result into config_data, set config_data = {} only if the result is None, and if the result is not a mapping (not an instance of dict) raise a ValueError describing the unexpected top-level YAML type; then pass the validated mapping to _resolve_bases. Reference yaml.safe_load, the local variable config_data, the load_config function context, and _resolve_bases to locate where to implement this check.
🧹 Nitpick comments (1)
tests/unit/torch/recipe/test_loader.py (1)
316-335: Please smoke-test the new built-in recipes.These tests only exercise
general/ptq/fp8_default-fp8_kv, so the newmodelopt_recipes/general/ptq/*-fp8_kv.ymlassets can still ship with broken__base__references or metadata without failing CI.🧪 Possible addition
+@pytest.mark.parametrize( + "recipe_name", + [ + "general/ptq/mxfp8_default-fp8_kv", + "general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv", + "general/ptq/int8_weight_only-fp8_kv", + "general/ptq/nvfp4_awq_clip-fp8_kv", + "general/ptq/mamba_moe_nvfp4_conservative-fp8_kv", + ], +) +def test_load_recipe_new_builtins(recipe_name): + recipe = load_recipe(recipe_name) + assert recipe.recipe_type == RecipeType.PTQ + assert isinstance(recipe, ModelOptPTQRecipe)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/recipe/test_loader.py` around lines 316 - 335, The tests only load a single built-in recipe; iterate over all built-in PTQ recipe assets (e.g., files matching the pattern modelopt_recipes/general/ptq/*-fp8_kv.yml) and call load_recipe for each to smoke-test them: for every path call load_recipe (same function used in test_load_recipe_builtin_with_suffix/test_load_recipe_builtin_without_suffix), assert the returned recipe.recipe_type is RecipeType.PTQ and that recipe.description is a non-empty string (and optionally that ModelOptPTQRecipe-specific fields like model_quant/kv_quant are present when appropriate) so any broken __base__ references or missing metadata fail CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/opt/config.py`:
- Around line 484-497: The code currently treats Path/Traversable configs as
final candidates and skips suffix probing and builtin lookups; change the branch
handling config_file (the Path/Traversable case) to mirror the str branch: if
config_file is a relative Path (not config_file.is_absolute()), probe for
suffixes by appending ".yml" and ".yaml" and also add
BUILTIN_RECIPES_LIB.joinpath(...) variants; if the Path already has a .yml/.yaml
suffix add it and the BUILTIN_RECIPES_LIB joinpath, and only bypass builtin
resolution for absolute Paths. Ensure you still append candidates to
paths_to_check and reuse the same variable names (config_file, paths_to_check,
BUILTIN_RECIPES_LIB).
In `@modelopt/torch/quantization/utils.py`:
- Around line 847-849: quant_cfg["quant_cfg"] can be explicitly set to None
which causes an AttributeError when calling .update(); change the logic to treat
falsy values as the default dict before updating: read the existing inner =
quant_cfg.get("quant_cfg") or {"default": {"enable": False}}, assign it back to
quant_cfg["quant_cfg"] and then .update(kv_cache_quant_cfg); reference the local
variable quant_cfg and the update call that currently uses kv_cache_quant_cfg to
locate the spot to change.
---
Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 768-804: The choices set in config.py is missing the newly
exported preset MXFP6_DEFAULT_CFG; update the choices variable to include
"MXFP6_DEFAULT_CFG" so validation against choices accepts that preset (locate
the choices declaration near the other preset names and add the exact string
"MXFP6_DEFAULT_CFG" alongside the other entries).
---
Duplicate comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 900-910: The recipe-driven quant path is still influenced by
args.qformat and QUANT_CFG_CHOICES; update the logic so the effective quant
format is derived from the recipe (or enforce mutual exclusivity) before calling
build_quant_cfg and before the QUANT_CFG_CHOICES check: either (A) extract a
qformat/mode from recipe (e.g., recipe.qformat or recipe.type), set a local
effective_format and pass that into build_quant_cfg and the subsequent
args.qformat checks, or (B) make --recipe mutually exclusive with qformat-driven
flags at CLI parse time and skip any args.qformat checks when recipe is
provided; apply this change around the build_quant_cfg(...) call and the later
if args.qformat in QUANT_CFG_CHOICES block and ensure
mtq.update_quant_cfg_with_kv_cache_quant uses the same derived/validated format.
In `@modelopt/torch/opt/config.py`:
- Around line 395-407: The current _merge_base_list and related code call
load_config(base_cfg) with raw relative paths, which lets CWD files shadow
built-ins and prevents recipe-relative resolution; update the logic (in
_merge_base_list and the code paths handling __base__ at lines around where
base_cfg is iterated) to resolve each relative base_cfg against the parent
directory of the file that matched the config (or against the built-in library
root for builtin identifiers) before passing it to load_config: if base_cfg is
relative, join it with the matched config's parent path (or library root) and
normalize the path, then call load_config(resolved_path) so anchors are properly
scoped to the loaded file. Ensure the same resolution is applied wherever
base_cfg entries are recursed (the other functions handling __base__ merging).
- Around line 509-511: The current use of yaml.safe_load(... ) or {} coerces
falsey non-mapping values into an empty config; instead, after calling
yaml.safe_load(config_path.read_text(...)) capture the result into config_data,
set config_data = {} only if the result is None, and if the result is not a
mapping (not an instance of dict) raise a ValueError describing the unexpected
top-level YAML type; then pass the validated mapping to _resolve_bases.
Reference yaml.safe_load, the local variable config_data, the load_config
function context, and _resolve_bases to locate where to implement this check.
In `@modelopt/torch/quantization/config.py`:
- Around line 860-868: The tuple_num_bits field validator currently uses assert
to check list length which is ignored with python -O; change the check in
tuple_num_bits (the `@field_validator` classmethod) to explicitly raise a
ValueError when num_bits is a list but len(num_bits) != 2, providing a clear
error message (e.g. "num_bits list must have length 2") before casting to
tuple[int, int]; keep the rest of the logic intact so valid lists are converted
to tuples and non-list inputs are returned unchanged.
- Around line 1031-1039: The validator tuple_block_sizes_scale_bits currently
assumes v is a dict and calls v.get(...), which will raise AttributeError for
non-dict raw inputs (e.g., lists) because this runs mode="before"; update the
function to first guard v with a type check (e.g., if not isinstance(v, dict) or
v is None: return v) and only then access v.get("scale_bits") and convert list
-> tuple; also change the return type annotation from int | tuple[int,int] to a
shape matching the validator (e.g., dict | None) so the signature reflects that
the function returns the (possibly mutated) dict or None, and keep references to
block_sizes and scale_bits in the implementation of
tuple_block_sizes_scale_bits.
In `@modelopt/torch/recipe/loader.py`:
- Around line 32-74: The function load_recipe treats non-str inputs
(Path/Traversable) as already resolved which skips the BUILTIN_RECIPES_LIB
lookup and suffix probing; change the resolver logic so that when recipe_path is
a Path that is relative (not absolute) it is processed the same way as str
inputs: attempt BUILTIN_RECIPES_LIB.joinpath(recipe_path + suffix) and probe
suffixes [""|".yml"|".yaml"], falling back to filesystem candidates
(Path(recipe_path + suffix)); only bypass built-in lookup when the Path is
absolute. Keep returning via _load_recipe_from_file and _load_recipe_from_dir as
before.
---
Nitpick comments:
In `@tests/unit/torch/recipe/test_loader.py`:
- Around line 316-335: The tests only load a single built-in recipe; iterate
over all built-in PTQ recipe assets (e.g., files matching the pattern
modelopt_recipes/general/ptq/*-fp8_kv.yml) and call load_recipe for each to
smoke-test them: for every path call load_recipe (same function used in
test_load_recipe_builtin_with_suffix/test_load_recipe_builtin_without_suffix),
assert the returned recipe.recipe_type is RecipeType.PTQ and that
recipe.description is a non-empty string (and optionally that
ModelOptPTQRecipe-specific fields like model_quant/kv_quant are present when
appropriate) so any broken __base__ references or missing metadata fail CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dfdf41d8-4d47-46b4-b9e6-f063d9dde90f
📒 Files selected for processing (41)
examples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pymodelopt/torch/opt/config.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/utils.pymodelopt/torch/recipe/loader.pymodelopt_recipes/configs/ptq/base.ymlmodelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/fp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.ymlmodelopt_recipes/general/ptq/int4_awq-fp8_kv.ymlmodelopt_recipes/general/ptq/int4_blockwise_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.ymlmodelopt_recipes/general/ptq/int8_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.ymlmodelopt_recipes/general/ptq/mamba_moe_nvfp4_conservative-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp6_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxfp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/mxint8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_clip-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_w4a4_weight_mse_fp8_sweep-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.ymlmodelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.ymlpyproject.tomltests/unit/torch/recipe/__init__.pytests/unit/torch/recipe/test_loader.py
✅ Files skipped from review due to trivial changes (2)
- modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
- tests/unit/torch/recipe/init.py
🚧 Files skipped from review as they are similar to previous changes (27)
- modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
- modelopt_recipes/configs/ptq/base.yml
- modelopt_recipes/general/ptq/int8_smoothquant-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_w4a4_weight_local_hessian-fp8_kv.yml
- modelopt_recipes/general/ptq/mxint8_default-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_mlp_weight_only-fp8_kv.yml
- modelopt_recipes/general/ptq/mamba_moe_fp8_conservative-fp8_kv.yml
- pyproject.toml
- modelopt_recipes/general/ptq/w4a8_awq_beta-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_awq_lite-fp8_kv.yml
- modelopt_recipes/general/ptq/int4_awq-fp8_kv.yml
- modelopt_recipes/general/ptq/mxfp6_default-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_svdquant_default-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_fp8_mha-fp8_kv.yml
- modelopt_recipes/general/ptq/fp8_per_channel_per_token-fp8_kv.yml
- examples/llm_ptq/example_utils.py
- modelopt_recipes/general/ptq/mamba_moe_fp8_aggressive-fp8_kv.yml
- modelopt_recipes/general/ptq/fp8_2d_blockwise_weight_only-fp8_kv.yml
- modelopt_recipes/general/ptq/mxfp4_mlp_weight_only-fp8_kv.yml
- modelopt_recipes/general/ptq/int8_default-fp8_kv.yml
- modelopt_recipes/general/ptq/w4a8_nvfp4_fp8-fp8_kv.yml
- modelopt_recipes/general/ptq/mxfp4_default-fp8_kv.yml
- modelopt_recipes/general/ptq/nvfp4_awq_full-fp8_kv.yml
- modelopt_recipes/general/ptq/mamba_moe_nvfp4_aggressive-fp8_kv.yml
- modelopt_recipes/general/ptq/w4a8_mxfp4_fp8-fp8_kv.yml
- modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
| - configs/ptq/w8a8_fp8_fp8 | ||
| - configs/ptq/calib_algo_max | ||
|
|
||
| kv_quant: |
There was a problem hiding this comment.
why do we need a separate kv_quant field?
| kv_quant: | ||
| - configs/ptq/kv_fp8 | ||
|
|
||
| model_quant: |
There was a problem hiding this comment.
This model_quant and kv_quant looks confusing to me.
Why do we need these fields? Can it be flat?
configs/ptq/quant_cfg/base
configs/ptq/quant_cfg/base_mamba
configs/ptq/quant_cfg/w8a8_fp8_fp8
configs/ptq/algorithm/calib_algo_max
configs/ptq/quant_cfg/kv_fp8
1. start a new config system using yaml/yml files. The config system
adopts Hydra style override, using the __base__ tag, which matches
pydantic's __base__ model mechnism. __base__ is a list, it
semantically is inheritance. The implementation is using OmegaConf to
merge configs, by the order in the __base__ list.
2. add a new top level package: modelopt_recipes
I want it to be a top level package so we can make it clear that the
modelopt package holds the code, this new package holds recipes
3. make a list of reusable quant config units, sitting at
modelopt_recipes/configs/ptq/.... Mostly 3 types:
2.1 model weights and activations quant config
2.2 kv quant config
2.3 algorithm config
with the __base__ config mechnism, we can have this much clearer
config system
4. implement some of the existing quantization recipes using the new
config system as model agnostic general recipes, but not actually in
use. these recipes sit inside modelopt_recipes/general/ptq/...
5. make sure the configs from the new config system match the exisiting configs
6. extend the hf_ptq script to enable recipe based PTQ
7. testted hf_ptq using both builtin and extenal config file. example
script:
python examples/llm_ptq/hf_ptq.py \
--model Qwen/Qwen3-8B \
--recipe general/ptq/fp8_default-fp8_kv \
--export_path=fp8_default-fp8_kv \
--calib_size=16 \
--batch_size=0 \
--trust_remote_code \
--export_fmt=hf
8. A recipe can be a single yaml file or a directory with a
recipe.{yml,yaml} file
9. use list instead of tuple in the hard coded configs. yaml files do
not have tuple type. for the bits, we accept both tuples and lists,
using lists in code so we can directly compare yaml file contents
with code configs.
10. make the config file based configs default and deprecate the
hardcoded ones. Convert all PTQ configs
11. Added tests for various recipe loading
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
5fe9f37 to
03ad6a9
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
398cbaa to
901b948
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
9fcf0f1 to
a6529be
Compare
|
|
||
| # Check if any bmm_quantizer is in the quant_cfg. If so, we need to enable the bmm_quantizer. | ||
| if enable_quant_kv_cache: | ||
| quant_cfg = mtq.update_quant_cfg_with_kv_cache_quant( |
What does this PR do?
start a new config system using yaml/yml files. The config system adopts Hydra style override, using the
__base__tag, which matches pydantic's__base__model mechnism.__base__is a list, it semantically is inheritance. The implementation is using OmegaConf to merge configs, by the order in the__base__list.add a new top level package: modelopt_recipes
I want it to be a top level package so we can make it clear that the modelopt package holds the code, this new package holds recipes
make a list of reusable quant config units, sitting at modelopt_recipes/configs/ptq/.... Mostly 3 types:
2.1 model weights and activations quant config 2.2 kv quant config 2.3 algorithm config
with the base config mechnism, we can have this much clearer config system
implement some of the existing quantization recipes using the new config system as model agnostic general recipes, but not actually in use. these recipes sit inside modelopt_recipes/general/ptq/...
make sure the configs from the new config system match the exisiting configs
extend the hf_ptq script to enable recipe based PTQ
testted hf_ptq using both builtin and extenal config file. example script:
Usage
python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --recipe general/ptq/fp8_default-fp8_kv \ ...Testing
python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --recipe general/ptq/fp8_default-fp8_kv \ --export_path=fp8_default-fp8_kv \ --calib_size=16 \ --batch_size=0 \ --trust_remote_code \ --export_fmt=hfBefore your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
New Features
Documentation
Tests
Chores