Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds block-granular randomized Hadamard transform (RHT) for non-power-of-2 dimensions: new RotateConfig with block_size in config, normalized_hadamard_transform supports block-wise FHT and block_size inference/validation, TensorQuantizer exposes/forwards block_size, and tests cover block-mode behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Config as RotateConfig
participant TensorQ as TensorQuantizer
participant FNS as normalized_hadamard_transform
participant FHT as FastHadamardTransform
Config->>TensorQ: provide rotate settings (enable, rotate_fp32, block_size)
TensorQ->>FNS: forward(inputs, rotate_fp32, block_size)
FNS->>FNS: validate or infer block_size (largest pow2 divisor)
FNS->>FNS: reshape inputs into blocks (num_blocks, block_size)
FNS->>FHT: apply FHT per block
FHT-->>FNS: transformed blocks
FNS-->>TensorQ: return transformed & rescaled tensor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1014 +/- ##
==========================================
- Coverage 70.09% 70.07% -0.02%
==========================================
Files 221 221
Lines 25459 25499 +40
==========================================
+ Hits 17845 17869 +24
- Misses 7614 7630 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
35a0aa6 to
a614f6a
Compare
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
a614f6a to
719b0d7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/torch/quantization/config.py (1)
990-995:⚠️ Potential issue | 🟡 MinorThe public rotate docs still describe only the old full-size Hadamard path.
When
block_sizeis set, or auto-selected for a non-power-of-2 dimension, the implementation applies a block-diagonal transform and normalizes each block bysqrt(block_size). The formula here still describeshadamard(input.shape[-1]) / sqrt(input.shape[-1]), which is incorrect for the new mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 990 - 995, The docstring for the rotate option currently states the input is transformed with scipy.linalg.hadamard(input.shape[-1]) / sqrt(input.shape[-1]), which is incorrect when block_size is set or auto-selected; update the documentation for the rotate parameter to state that when block_size is specified (or auto-selected for non-power-of-2 dims) a block-diagonal Hadamard transform is applied (i.e., each block uses scipy.linalg.hadamard(block_size)) and each block is normalized by 1/sqrt(block_size), whereas the full Hadamard (hadamard(N)/sqrt(N)) only applies when using the full dimension without blocking. Ensure you reference the rotate parameter, block_size behavior, and scipy.linalg.hadamard in the docstring so readers know the per-block normalization and auto-selection behavior.
🧹 Nitpick comments (1)
tests/gpu/torch/quantization/test_hadamard.py (1)
56-66: Add one config-driven coverage case forrotate.block_size.This only exercises
normalized_hadamard_transform()directly, so it will not catch regressions inQuantizerAttributeConfig.rotateparsing or the newTensorQuantizer.forward(..., block_size=self.rotate_block_size)plumbing. A smallset_quantizer_by_cfg_context(..., {"rotate": {"enable": True, "block_size": 32}})case would close that gap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/quantization/test_hadamard.py` around lines 56 - 66, Add a config-driven coverage case that exercises the rotate.block_size plumbing: in test_hadamard_transform_block add (or a new short test) a context using set_quantizer_by_cfg_context({"rotate": {"enable": True, "block_size": 32}}) then call normalized_hadamard_transform on the same input tensor without passing block_size (so the call path uses QuantizerAttributeConfig.rotate / TensorQuantizer.forward via rotate_block_size), and assert the xxt preservation as before; this ensures rotate parsing and forward(..., block_size=self.rotate_block_size) plumbing are exercised in addition to direct normalized_hadamard_transform calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 978-988: Add a pydantic field validator for the rotate
ModeloptField to validate that if "block_size" is present it must be an int or
None and must be >0 (reject booleans and non-positive values), and update the
rotate field docstring to describe both full-dimension Hadamard and
block-granular RHT behaviors; specifically, add a `@field_validator`("rotate")
that inspects the dict (or coerced bool) and raises a ValueError when block_size
is not None and not an int > 0 (or when it is a bool), ensure the type
annotation allows None for block_size, and mention in the docstring how
rotate_block_size and normalized_hadamard_transform handle None (auto-select
largest power-of-2 divisor) vs an explicit positive block_size for
block-granular transforms.
---
Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 990-995: The docstring for the rotate option currently states the
input is transformed with scipy.linalg.hadamard(input.shape[-1]) /
sqrt(input.shape[-1]), which is incorrect when block_size is set or
auto-selected; update the documentation for the rotate parameter to state that
when block_size is specified (or auto-selected for non-power-of-2 dims) a
block-diagonal Hadamard transform is applied (i.e., each block uses
scipy.linalg.hadamard(block_size)) and each block is normalized by
1/sqrt(block_size), whereas the full Hadamard (hadamard(N)/sqrt(N)) only applies
when using the full dimension without blocking. Ensure you reference the rotate
parameter, block_size behavior, and scipy.linalg.hadamard in the docstring so
readers know the per-block normalization and auto-selection behavior.
---
Nitpick comments:
In `@tests/gpu/torch/quantization/test_hadamard.py`:
- Around line 56-66: Add a config-driven coverage case that exercises the
rotate.block_size plumbing: in test_hadamard_transform_block add (or a new short
test) a context using set_quantizer_by_cfg_context({"rotate": {"enable": True,
"block_size": 32}}) then call normalized_hadamard_transform on the same input
tensor without passing block_size (so the call path uses
QuantizerAttributeConfig.rotate / TensorQuantizer.forward via
rotate_block_size), and assert the xxt preservation as before; this ensures
rotate parsing and forward(..., block_size=self.rotate_block_size) plumbing are
exercised in addition to direct normalized_hadamard_transform calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee2ffa98-eca1-431e-8c8e-9afc20b99a49
📒 Files selected for processing (5)
CHANGELOG.rstmodelopt/torch/quantization/config.pymodelopt/torch/quantization/nn/functional.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pytests/gpu/torch/quantization/test_hadamard.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 978-1012: Replace the raw dict for the rotate field with a nested
Pydantic model to enforce allowed keys and per-field types: define a
RotateConfig model (fields: enable: bool = False, rotate_fp32: bool = False,
block_size: Optional[int] = None) with Config extra=forbid to reject unknown
keys, then change the field type from dict[str, ...] to bool | RotateConfig in
the rotate ModeloptField declaration; update the validate_rotate classmethod to
accept bool (return as-is or wrap into RotateConfig) and to convert/validate
incoming dicts by instantiating RotateConfig (e.g., RotateConfig.model_validate
or RotateConfig(**v)), raising any validation errors so callers like
normalized_hadamard_transform get a guaranteed typed config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c48b57a9-36f5-4714-b058-29dd0a740b9f
📒 Files selected for processing (1)
modelopt/torch/quantization/config.py
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/gpu/torch/quantization/test_hadamard.py (1)
56-68: Consider adding a smallatolfallback androtate_fp32coverage.Two observations on test robustness:
Using
atol=0.0relies entirely on relative tolerance. If any element inxxtorxxt_his very close to zero, even tiny absolute differences could cause spurious failures. A small absolute tolerance (e.g.,atol=1e-6) provides a safety margin.The existing
test_hadamard_transformvalidates both default androtate_fp32=Truemodes. Consider extending this test (or adding a parameter) to coverrotate_fp32=Truewith block_size to ensure numerical stability in float32 rotation mode.Suggested improvement
`@pytest.mark.parametrize`( ("dim", "block_size"), [(1920, 128), (1536, 128), (1920, None), (64, 32)], ) -def test_hadamard_transform_block(dim, block_size): +@pytest.mark.parametrize("rotate_fp32", [False, True]) +def test_hadamard_transform_block(dim, block_size, rotate_fp32): """Block-granular RHT for non-power-of-2 dimensions (e.g. MoE expert channels).""" x = torch.rand(4, dim, device="cuda") xxt = x @ x.T - x_h = normalized_hadamard_transform(x, block_size=block_size) + x_h = normalized_hadamard_transform(x, rotate_fp32=rotate_fp32, block_size=block_size) xxt_h = x_h @ x_h.T # Use rtol instead of atol: float32 accumulated error scales with value magnitude, # which grows with dim. 1e-3 relative tolerance is appropriate for float32 block RHT. - assert torch.allclose(xxt_h, xxt, rtol=1e-3, atol=0.0) + assert torch.allclose(xxt_h, xxt, rtol=1e-3, atol=1e-6)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/quantization/test_hadamard.py` around lines 56 - 68, Update test_hadamard_transform_block to add a small absolute tolerance and cover rotate_fp32: change the assertion to use atol=1e-6 instead of 0.0 and parametrize the test (or add a loop) to call normalized_hadamard_transform(..., rotate_fp32=bool) so both rotate_fp32=True and False are exercised alongside block_size; reference the test function test_hadamard_transform_block and the normalized_hadamard_transform call when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/gpu/torch/quantization/test_hadamard.py`:
- Around line 56-68: Update test_hadamard_transform_block to add a small
absolute tolerance and cover rotate_fp32: change the assertion to use atol=1e-6
instead of 0.0 and parametrize the test (or add a loop) to call
normalized_hadamard_transform(..., rotate_fp32=bool) so both rotate_fp32=True
and False are exercised alongside block_size; reference the test function
test_hadamard_transform_block and the normalized_hadamard_transform call when
making these changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0568022d-4b9e-4de8-aec0-3dcdba267050
📒 Files selected for processing (1)
tests/gpu/torch/quantization/test_hadamard.py
What does this PR do?
Added support for RHT with non-power of 2. Rotate quantization configuration can be used to specify block_size as well.
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Summary by CodeRabbit
New Features
Tests
Documentation