Sequential calibrate refactor by sugunav14 · Pull Request #982 · NVIDIA/Model-Optimizer

sugunav14 · 2026-03-05T19:08:05Z

What does this PR do?

Type of change: New feature

The current sequential calibration support has O(N^2) complexity for collecting updated activations for a decoder layer. To solve this, we adopted a modular/plugin based approach which involves hooks to capture the updated activations by running forward on the previous decoder layer using cached prev layer activations. This leads to an issue with nested modules i.e. the logic in the parent module might need to be replicated in the lower level modules to ensure equivalence. For example, in the nemotron model, the parent module NemotronHModel has logic to create and select appropriate mask based on the decoder layer type (mamba vs attention).

This PR implements a more generic solution for sequential calibration, by choosing to collect activations using model forward, thereby ensuring that all the parent module logic is preserved. We use an attribute "state"on the modules to indicate whether to perform recomputation/skip the layer while running module forward. This can help us avoid redundant computations for getting updated activations.

The overall flow is as follows

The user must register a get_decoder_layers() function that returns a list of layers to be calibrated sequentially
LayerActivationCollector, goes through the list of layers and patches module forward with a "state aware" module forward
When model.forward() is called, all the parent logic is recomputed as expected (embeddings, residual connections, generating attention mask etc).
Lets say we are currently calibrating layer N and we want to get updated activations; we set layer N to capture and layer N-1 to run (because this layer was processed previously and updated activations need to be generated). Already processed layers are set to skip. When model.forward() is called, all the previous decoder layer computations are skipped. Layer N-1 uses the cached inputs to generate new activations. Layer N inputs are captured using the same logic as before and cached so that they can be used to get updated activations for Layer N+1.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added an activation-collection utility and a public sequential, per-layer calibration API.
- Extended model discovery to support Nemotron-H and homogeneous HuggingFace transformer variants.
Improvements
- Clearer validation and error messages for calibration workflows.
- Stronger lifecycle and resource management: deterministic patching/unpatching, per-layer cleanup, and CUDA cache handling.
- Improved per-layer logging and progress reporting.
Tests
- Extensive new unit tests covering discovery, per-layer capture/replay, inter-layer logic, and edge cases.

copy-pr-bot · 2026-03-05T19:08:09Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-05T19:09:20Z

📝 Walkthrough

Walkthrough

LayerActivationCollector was moved into a new utils module and implemented as a stateful patching helper. sequential_calibrate was updated to use it for per-layer capture/replay. HuggingFace plugin added decoder-discovery helpers and registered them. Tests expanded to cover collector behavior and sequential calibration scenarios.

Changes

Cohort / File(s)	Summary
LayerActivationCollector (new module) `modelopt/torch/quantization/utils/activation_collector.py`, `modelopt/torch/quantization/utils/__init__.py`	Added LayerActivationCollector with per-layer _LayerCalibState, patched forward modes (skip/run/capture/original), registry for decoder discovery, patch/unpatch lifecycle, input metadata helpers, and public get_input_activations. Re-exported via utils init.
Core utils cleanup `modelopt/torch/quantization/utils/core_utils.py`	Removed previous LayerActivationCollector and related forward-loop/early-stop logic from core_utils; updated imports to new locations and removed old all export.
Sequential calibration `modelopt/torch/quantization/model_calib.py`	Reworked sequential_calibrate to instantiate LayerActivationCollector, patch all decoder layers up-front, perform per-layer get_input_activations and per-layer forward loops for calib_func, add explicit forward_loop validation, logging, try/finally unpatching and CUDA cache clears.
HuggingFace plugin `modelopt/torch/quantization/plugins/huggingface.py`	Added model detectors: `is_nemotron_h_model`, `get_nemotron_h_decoder_layers`, `is_homogeneous_hf_model`, `get_homogeneous_hf_decoder_layers`; import LayerActivationCollector and register these discoverers (duplicated across init/public registration paths).
Network util removal `modelopt/torch/utils/network.py`	Removed legacy `get_decoder_layers` function; decoder discovery delegated to LayerActivationCollector registry.
Tests — calibration & collector `tests/unit/torch/quantization/test_calib.py`, `tests/unit/torch/quantization/test_sequential_calibrate.py`, `tests/unit/torch/quantization/test_utils.py`, `tests/unit/torch/quantization/plugins/test_huggingface.py`	Added extensive tests: sequential_calibrate behavior (support gating, multi-batch capture, inter-layer logic, weight-propagation), LayerActivationCollector API/registration/caching/edge cases, and HuggingFace model detection/registration verification. Updated imports to new activation_collector path.

Sequence Diagram

sequenceDiagram
    participant Model
    participant LAC as LayerActivationCollector
    participant ForwardLoop
    participant CalibFunc as Calibration Function

    Note over Model,LAC: Initialization
    ForwardLoop->>LAC: provide model & decoder_layers
    LAC->>Model: _patch_all_layers(decoder_layers)
    Model-->>LAC: patched forwards installed

    loop for each decoder layer i
        Note over ForwardLoop,LAC: Capture phase
        ForwardLoop->>Model: run forward (patched)
        Model->>LAC: patched_forward(mode=capture) -> record inputs, raise EarlyStop
        LAC->>ForwardLoop: return captured inputs for layer i

        Note over ForwardLoop,CalibFunc: Calibration
        ForwardLoop->>CalibFunc: call with per-layer forward that replays captured inputs
        CalibFunc->>Model: apply weight updates to layer i

        Note over ForwardLoop,LAC: Replay phase
        ForwardLoop->>Model: run forward (patched, mode=run)
        Model->>LAC: patched_forward(mode=run) -> replay using cached inputs

        LAC->>LAC: _set_layer_states(i+1)
    end

    Note over Model,LAC: Cleanup
    LAC->>Model: _unpatch_all_layers()
    Model-->>LAC: original forwards restored

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Sequential calibrate refactor' accurately reflects the main objective of the PR, which is to refactor the sequential calibration mechanism with a new state-driven approach using LayerActivationCollector.
Security Anti-Patterns	✅ Passed	No security anti-patterns found: no torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec calls, nosec comments, or new PIP dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch svelury/sequential-calibrate-refactor

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cjluo-nv · 2026-03-08T17:54:48Z

modelopt/torch/quantization/utils.py


-class LayerActivationCollector:
-    """Helper class for collecting layer activations during forward passes.
+@dataclass


can we move the patching & capture related logics to a new file?

okay, will do!

Copilot

Pull request overview

Refactors sequential calibration to avoid O(N²) activation collection by introducing a stateful, model-forward-based activation collection approach (skip/run/capture) that preserves parent-module forward logic.

Changes:

Replaces the previous decoder-layer detection + per-layer patching with a state-aware LayerActivationCollector that patches all decoder layers and uses early-stop to capture inputs efficiently.
Updates sequential_calibrate() to use the new collector and removes the legacy get_decoder_layers() heuristic from modelopt/torch/utils/network.py.
Adds/extends unit tests to validate skip/run/capture behavior, decoder-layer discoverer registration, and HuggingFace integration.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/unit/torch/quantization/test_utils.py	Adds unit tests for the new `LayerActivationCollector` registration/discovery API and skip/run/capture semantics.
tests/unit/torch/quantization/test_sequential_calibrate.py	Adds deeper behavioral tests for tuple outputs, inter-layer ops, mode transitions, and cleanup for sequential calibration patching.
tests/unit/torch/quantization/test_calib.py	Adds tests for the new sequential calibration support gate and activation propagation behavior.
tests/unit/torch/quantization/plugins/test_huggingface.py	Adds tests validating HuggingFace decoder-layer discoverer registration and homogeneous model detection.
modelopt/torch/utils/network.py	Removes the old `get_decoder_layers()` heuristic helper.
modelopt/torch/quantization/utils.py	Introduces the new stateful `LayerActivationCollector` with patched forward and output-metadata-based skipping.
modelopt/torch/quantization/plugins/huggingface.py	Registers HuggingFace decoder-layer discoverers with `LayerActivationCollector`.
modelopt/torch/quantization/model_calib.py	Updates `sequential_calibrate()` to use the new collector lifecycle (patch once, calibrate layer-by-layer, unpatch).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T17:56:24Z

modelopt/torch/quantization/plugins/huggingface.py

+def is_homogenous_hf_model(model: nn.Module) -> bool:
+    if is_nemotron_h_model(model):
+        return False
+    decoder_layers = get_homogeneous_hf_decoder_layers(model)
+    if decoder_layers is None or len(decoder_layers) == 0:
+        return False
+    layer_classes = {type(layer) for layer in decoder_layers}
+    return len(layer_classes) == 1


The helper is named is_homogenous_hf_model, but the correct term is “homogeneous”. Since this is a newly introduced API surface (and is referenced by tests/registration), consider renaming it (and its uses) to is_homogeneous_hf_model for clarity and to avoid baking in a misspelling.

Copilot · 2026-03-08T17:56:24Z

tests/unit/torch/quantization/test_sequential_calibrate.py

+def _register_test_discoverer(monkeypatch):
+    """Register a simple discoverer that finds model.layers on any model."""
+    monkeypatch.setattr(
+        LayerActivationCollector,
+        "_decoder_layer_support",
+        [(lambda m: hasattr(m, "layers"), lambda m: m.layers)],
+    )


This file now introduces _register_test_discoverer() for decoder-layer discovery, but earlier tests in the same module still call LayerActivationCollector.get_input_activations() / sequential_calibrate() without registering any decoder-layer support (and without patching collector layers). With the new LayerActivationCollector implementation, those tests will fail unless they register a discoverer (like this helper) and patch/unpatch appropriately.

Copilot · 2026-03-08T17:56:24Z

modelopt/torch/quantization/utils.py

+    _decoder_layer_support: list[tuple[Any, Any]] = []
+    _LAYER_ATTR = "_seq_calib"
+
    def __init__(self, model: nn.Module):
        self.model = model
+        self._decoder_layers: nn.ModuleList | None = None
+        self._layer_to_idx: dict[nn.Module, int] = {}
+        self._patched = False
+
+    # ------------------------------------------------------------------
+    # Decoder-layer discovery
+    # ------------------------------------------------------------------

    @staticmethod
-    def _patch_and_initialize_layer(layer: torch.nn.Module, stop_after_collection: bool = False):
-        """Patch a layer to collect inputs during forward passes."""
+    def get_decoder_layers(model: nn.Module) -> nn.ModuleList | None:
+        """Return decoder layers supported by sequential calibration."""
+        for is_supported, discoverer in LayerActivationCollector._decoder_layer_support:
+            if not is_supported(model):
+                continue
+            decoder_layers = discoverer(model)
+            if decoder_layers is not None:
+                return decoder_layers
+        return None


LayerActivationCollector._decoder_layer_support defaults to an empty list, so LayerActivationCollector.is_supported() (and thus sequential_calibrate()) will always fail unless some plugin/module has been imported that registers discoverers. If sequential_calibrate/collector are intended to be usable when imported directly (without modelopt.torch.quantization which imports plugins), consider registering baseline discoverers for common patterns (e.g. model.layers, model.model.layers, model.decoder.layers, model.transformer.h) in utils.py, similar to the removed get_decoder_layers heuristic.

modelopt/torch/quantization/model_calib.py

Copilot · 2026-03-08T17:56:25Z

modelopt/torch/quantization/utils.py

+        Layers before the target are skipped or re-run (if just calibrated), the
+        target layer captures its inputs, and an early-stop prevents unnecessary
+        computation beyond the target.
+        """


LayerActivationCollector.get_input_activations() assumes _patch_all_layers() has already been called (it indexes into _layer_to_idx and relies on patched forwards). As written, calling this public method without patching will raise KeyError / behave incorrectly. Either make patching lazy inside get_input_activations() (patch on first use) or add a clear assertion/error telling callers to call _patch_all_layers() first.

Suggested change

"""

"""

# Ensure layers have been patched before attempting to collect activations.

if not hasattr(self, "_layer_to_idx") or layer not in self._layer_to_idx:

raise RuntimeError(

"LayerActivationCollector.get_input_activations() was called before layers were patched. "

"Make sure to call LayerActivationCollector._patch_all_layers() (or the appropriate "

"patching method) before requesting input activations."

)

modelopt/torch/quantization/utils.py

modelopt-bot · 2026-03-08T18:39:19Z

Logic Correctness Review

Overall, this is a well-designed refactor that correctly implements the state machine for sequential calibration. However, I found several logic issues that should be addressed:

🔴 Critical Issues

1. Race condition in state transitions for layer 0 and 1

In _set_layer_states(), when calibrating layer 0 or 1, the logic doesn't properly handle edge cases:

if layer_idx > 1:
    done = self._decoder_layers[layer_idx - 2]._seq_calib
    done.mode = "skip"
    done.cached_inputs = []  # Free memory

if layer_idx > 0:
    prev = self._decoder_layers[layer_idx - 1]._seq_calib
    prev.mode = "run"
    prev.cached_inputs = prev.collected_inputs  # Move collected → cached
    prev.collected_inputs = []

Issue: When layer_idx == 0:

No layer transitions to "skip" (correct, none should)
No layer transitions to "run" (correct, no previous layer)
Layer 0 transitions to "capture" (correct)

However, when layer_idx == 1:

Layer -1 doesn't exist (no issue, condition prevents access)
Layer 0 transitions to "run" ✓
Layer 1 transitions to "capture" ✓

But there's a subtle bug: layer 0's collected_inputs from the previous iteration (when it was layer 0) should have been preserved. Let me trace through:

Iteration 0 (calibrating layer 0):

Layer 0: mode="capture", collects inputs
After capture: mode transitions to "original" in get_input_activations()

Iteration 1 (calibrating layer 1):

Layer 0: should transition to "run" with cached_inputs = collected_inputs

BUG: Layer 0's collected_inputs was never saved! In iteration 0, after capturing, we do:

inputs = list(info.collected_inputs)  # Copy for return
info.mode = "original"  # But collected_inputs is still there

Then in iteration 1, we do prev.cached_inputs = prev.collected_inputs, which should work...

Wait, let me re-check. Actually the logic looks correct because collected_inputs persists until explicitly cleared. But there's still an issue:

The real bug: When layer 0 transitions to "original" after its calibration, and then we start calibrating layer 1, we call _set_layer_states(1) which sets layer 0 to "run" and moves collected_inputs to cached_inputs. But what if layer 0 was called multiple times during iteration 0's calibration? The collected_inputs would have multiple entries, and we'd only replay them once in iteration 1.

Actually, looking more carefully at the test test_layer_activation_collector_run_uses_cached_inputs_not_parent, this seems intentional - each layer's inputs are captured once per batch, then replayed once.

Let me reconsider... The logic seems correct for the intended use case. However, there's a different issue:

2. Missing validation: What if `collected_inputs` is empty when transitioning to "run"?

if info.mode == "run":
    assert info.cached_inputs, (
        f"Layer {info.name} is in 'run' mode but has no cached inputs to replay."
    )

This assert is good, but it only fires during forward. A better place to check is in _set_layer_states():

if layer_idx > 0:
    prev = self._decoder_layers[layer_idx - 1]._seq_calib
    if not prev.collected_inputs:
        raise RuntimeError(
            f"Layer {layer_idx - 1} has no collected inputs to replay. "
            "This may indicate the previous layer was never called during forward."
        )
    prev.mode = "run"
    prev.cached_inputs = prev.collected_inputs
    prev.collected_inputs = []

3. Memory leak: `output_meta` never cleared for skip layers

When a layer transitions to "skip", we clear cached_inputs but not output_meta:

if layer_idx > 1:
    done = self._decoder_layers[layer_idx - 2]._seq_calib
    done.mode = "skip"
    done.cached_inputs = []  # Cleared
    # output_meta not cleared - minor memory leak for large models

For models with many layers, keeping output_meta for all skip layers could add up (though it's small per layer).

🟡 Medium Issues

4. Incorrect error message in `sequential_calibrate`

The PR description mentions this, but the error message still says:

"Could not find transformer layers in model'. "  # Note the stray single quote

Should be:

"Could not find transformer layers in model. "

5. `_patched` flag not reset on exception

In _patch_all_layers():

def _patch_all_layers(self):
    self._decoder_layers = self.get_decoder_layers(self.model)
    assert self._decoder_layers is not None
    # ... patches layers ...
    self._patched = True  # Set at end

If an exception occurs mid-patching (e.g., bind_forward_method fails), _patched remains False, but some layers may be partially patched. The _unpatch_all_layers() checks if not self._patched: return, so it won't clean up.

Better to set _patched = True only after all patches succeed, or handle cleanup on exception.

6. Thread safety: Class-level `_decoder_layer_support` is mutable

_decoder_layer_support: list[tuple[Any, Any]] = []

This is a class-level mutable default. If multiple models are calibrated concurrently (unlikely but possible), there could be race conditions. Consider:

Making it a frozen collection after initialization
Or documenting that it's not thread-safe

🟢 Minor Issues

7. Docstring typo

In LayerActivationCollector.get_decoder_layers():

"""Return decoder layers supported by sequential calibration."""

Should probably be:

"""Return decoder layers for sequential calibration, or None if not supported."""

8. Inconsistent handling of non-tensor outputs

In _extract_output_meta():

return ("other", output)

This stores the actual non-tensor value (e.g., None, a dict, etc.). In _zeros_from_meta():

return meta[1]  # Returns the original value

This is correct for None, but what if the output is a dict with mutable state? The skip layer would return the same dict instance that was captured, which could lead to unexpected mutations. Consider deep-copying or documenting this behavior.

Summary

The core logic is sound, but I recommend addressing:

Add validation in _set_layer_states() to check for empty collected_inputs before transitioning to "run"
Clear output_meta when transitioning to "skip" to free memory
Fix the typo in the error message
Consider exception safety in _patch_all_layers()
Document or fix the mutable default in _extract_output_meta() for non-tensor outputs

Would you like me to suggest specific code changes for these issues?

modelopt/torch/quantization/model_calib.py

modelopt/torch/quantization/utils.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

modelopt/torch/quantization/plugins/huggingface.py (1)
1199-1206: Consider making the generic HF discoverer structural instead of homogeneous-only.

LayerActivationCollector now tracks output_meta per layer, and the new regression cases cover heterogeneous layer stacks. Requiring a single layer class here seems stricter than the runtime needs and will exclude mixed-block decoders that still expose a clean model.model.layers list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/huggingface.py` around lines 1199 - 1206,
The current is_homogenous_hf_model function rejects decoders with mixed layer
classes; instead relax it to a structural discoverer: in is_homogenous_hf_model
(and still respect is_nemotron_h_model) return True whenever
get_homogeneous_hf_decoder_layers(model) yields a non-empty decoder layer list
and the model exposes a layers sequence (e.g. model.model.layers) whose elements
provide the runtime metadata LayerActivationCollector expects (check for the
presence of output_meta or another per-layer attribute/ability that
LayerActivationCollector uses), and remove the strict len(layer_classes) == 1
check; keep using get_homogeneous_hf_decoder_layers and is_nemotron_h_model as
anchors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/activation_collector.py`:
- Around line 50-52: The current implementation stores a single output_meta that
gets overwritten in ActivationCollector.run and reused in skip, causing replayed
batches to get incorrect shapes; change output_meta from a single tuple to a
per-batch sequence (e.g., a deque or list) named output_meta_list or
output_meta: deque, append the metadata for each captured batch inside
ActivationCollector.run (use the same indexing order as
cached_inputs/collected_inputs), update ActivationCollector.skip and any replay
logic to consume/peek the corresponding per-batch metadata in FIFO order
(maintaining alignment with cached_inputs entries) and remove or rotate entries
consistently after replays so each replayed batch uses its original per-batch
output_meta rather than a shared slot (also apply the same change to the other
occurrence referenced around lines 163-179).
- Around line 241-265: The state-transition code in _set_layer_states assumes
prior stages captured/replayed successfully; add fail-fast checks before
changing modes: verify done.output_meta is present (non-empty) before setting
done.mode="skip" and verify prev.collected_inputs is non-empty before setting
prev.mode="run" and copying to prev.cached_inputs; if either check fails, raise
a clear RuntimeError indicating the specific layer index and missing data (use
layer_idx and references to self._decoder_layers[...] ._seq_calib). After
forward_loop() completes, also validate that the current layer's
collected_inputs is non-empty and raise a descriptive error immediately if the
capture is empty. Apply the same guardrails to the analogous transition block
later in the file (the other spot that manipulates ._seq_calib,
.collected_inputs and .cached_inputs).

---

Nitpick comments:
In `@modelopt/torch/quantization/plugins/huggingface.py`:
- Around line 1199-1206: The current is_homogenous_hf_model function rejects
decoders with mixed layer classes; instead relax it to a structural discoverer:
in is_homogenous_hf_model (and still respect is_nemotron_h_model) return True
whenever get_homogeneous_hf_decoder_layers(model) yields a non-empty decoder
layer list and the model exposes a layers sequence (e.g. model.model.layers)
whose elements provide the runtime metadata LayerActivationCollector expects
(check for the presence of output_meta or another per-layer attribute/ability
that LayerActivationCollector uses), and remove the strict len(layer_classes) ==
1 check; keep using get_homogeneous_hf_decoder_layers and is_nemotron_h_model as
anchors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 913af19a-8050-4673-b980-42453284b8d7

📥 Commits

Reviewing files that changed from the base of the PR and between 31f0783 and 50e31f0.

📒 Files selected for processing (9)

modelopt/torch/quantization/activation_collector.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/huggingface.py
modelopt/torch/quantization/utils.py
modelopt/torch/utils/network.py
tests/unit/torch/quantization/plugins/test_huggingface.py
tests/unit/torch/quantization/test_calib.py
tests/unit/torch/quantization/test_sequential_calibrate.py
tests/unit/torch/quantization/test_utils.py

💤 Files with no reviewable changes (1)

modelopt/torch/utils/network.py

coderabbitai · 2026-03-11T04:43:52Z

modelopt/torch/quantization/utils/activation_collector.py

+    cached_inputs: deque = field(default_factory=deque)
+    collected_inputs: list = field(default_factory=list)
+    output_meta: tuple | None = None


⚠️ Potential issue | 🟠 Major

output_meta only tracks the last batch.

run overwrites a single output_meta on every replayed batch, and skip reuses that one shape for every later batch. With a normal drop_last=False loader or variable sequence lengths, a later pass can synthesize dummy outputs with the wrong dimensions for earlier batches. This needs per-batch metadata that can be replayed in order on each future pass, not one shared slot per layer.

Also applies to: 163-179

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/activation_collector.py` around lines 50 - 52, The current implementation stores a single output_meta that gets overwritten in ActivationCollector.run and reused in skip, causing replayed batches to get incorrect shapes; change output_meta from a single tuple to a per-batch sequence (e.g., a deque or list) named output_meta_list or output_meta: deque, append the metadata for each captured batch inside ActivationCollector.run (use the same indexing order as cached_inputs/collected_inputs), update ActivationCollector.skip and any replay logic to consume/peek the corresponding per-batch metadata in FIFO order (maintaining alignment with cached_inputs entries) and remove or rotate entries consistently after replays so each replayed batch uses its original per-batch output_meta rather than a shared slot (also apply the same change to the other occurrence referenced around lines 163-179).

coderabbitai · 2026-03-11T04:43:52Z

modelopt/torch/quantization/utils/activation_collector.py

+    def _set_layer_states(self, layer_idx: int):
+        """Transition layer modes for the next calibration step.
+
+        When calibrating layer *i*, three transitions happen:
+
+        * Layer ``i - 2`` → **skip** (fully done, free its cached inputs).
+        * Layer ``i - 1`` → **run** (replay captured inputs with calibrated weights).
+        * Layer ``i``     → **capture** (record inputs, then early-stop).
+        """
+        assert self._decoder_layers is not None
+
+        if layer_idx > 1:
+            done = self._decoder_layers[layer_idx - 2]._seq_calib
+            done.mode = "skip"
+            done.cached_inputs = deque()
+
+        if layer_idx > 0:
+            prev = self._decoder_layers[layer_idx - 1]._seq_calib
+            prev.mode = "run"
+            prev.cached_inputs = deque(prev.collected_inputs)
+            prev.collected_inputs = []
+
+        cur = self._decoder_layers[layer_idx]._seq_calib
+        cur.mode = "capture"
+        cur.collected_inputs = []


⚠️ Potential issue | 🔴 Critical

Fail fast when the sequential state machine falls out of sync.

These transitions assume the previous pass both captured inputs and replayed once. If a caller skips a decoder layer, or forward_loop() never reaches the target, you only discover it later when a skipped layer hits the RuntimeError on Line 165. Please validate done.output_meta / prev.collected_inputs before switching modes, and reject an empty capture right after forward_loop() so the failure is reported at the source.

Suggested guardrails

def _set_layer_states(self, layer_idx: int): assert self._decoder_layers is not None if layer_idx > 1: done = self._decoder_layers[layer_idx - 2]._seq_calib + if done.output_meta is None: + raise RuntimeError( + f"Layer {done.name} cannot enter 'skip' before a successful replay." + ) done.mode = "skip" done.cached_inputs = deque() if layer_idx > 0: prev = self._decoder_layers[layer_idx - 1]._seq_calib + if not prev.collected_inputs: + raise RuntimeError( + f"Layer {prev.name} has no captured inputs to replay." + ) prev.mode = "run" prev.cached_inputs = deque(prev.collected_inputs) prev.collected_inputs = []

forward_loop(self.model) info = layer._seq_calib inputs = list(info.collected_inputs) # After capture, set to original so calib_func can call the layer's # real forward directly. The layer will transition to run → skip # in subsequent iterations via _set_layer_states. info.mode = "original" + if not inputs: + raise RuntimeError( + f"Layer {info.name} did not capture any inputs during forward_loop()." + ) return inputs

Also applies to: 293-307

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/activation_collector.py` around lines 241 - 265, The state-transition code in _set_layer_states assumes prior stages captured/replayed successfully; add fail-fast checks before changing modes: verify done.output_meta is present (non-empty) before setting done.mode="skip" and verify prev.collected_inputs is non-empty before setting prev.mode="run" and copying to prev.cached_inputs; if either check fails, raise a clear RuntimeError indicating the specific layer index and missing data (use layer_idx and references to self._decoder_layers[...] ._seq_calib). After forward_loop() completes, also validate that the current layer's collected_inputs is non-empty and raise a descriptive error immediately if the capture is empty. Apply the same guardrails to the analogous transition block later in the file (the other spot that manipulates ._seq_calib, .collected_inputs and .cached_inputs).

cjluo-nv

Review Summary

The core refactoring from O(N²) to O(N) sequential calibration via skip/run/capture state machine is well-designed. The architecture, cleanup handling, and test coverage are all strong. However, there are a few issues that should be addressed before merge.

Critical: Missing model pattern registrations (regression)

The old get_decoder_layers() in network.py supported 5 model patterns. The PR only registers 2 (Nemotron-H and homogeneous HF). Missing support for:

Megatron/MCore: model.decoder.layers — no registration in plugins/megatron.py
GPT-style: model.transformer.h
Direct layers: model.layers (when it's an nn.ModuleList)
Nemotron Super/Nano without block_type: old code matched model.backbone.layers unconditionally; new get_nemotron_h_decoder_layers requires block_type on layers[0]

Models that previously worked with sequential calibration will now raise "Could not find transformer layers".

Medium Issues

Typo in public API: is_homogenous_hf_model → should be is_homogeneous_hf_model (missing 'e'). Appears in function name, test name, and registration. Worth fixing now before it becomes a stable API.
forward_loop is None guard removed: The old code explicitly raised ValueError("forward_loop must not be None ..."). The PR removes this check (and the corresponding test test_seq_calib_raises_on_none_forward_loop) without adding an equivalent guard. A None forward_loop will now produce an unhelpful TypeError deep in the stack.
_decoder_layer_support as mutable class variable (activation_collector.py:83): This list is shared across all instances and grows monotonically. Tests work around this with monkeypatch, but in production, if plugins are re-imported or the module is reloaded, entries could accumulate. Consider documenting this as intentional or adding a guard.

Minor

License header on the new file says 2024 — should likely be 2025 for a new file.
copy.deepcopy(meta[1]) in _zeros_from_meta for the "other" case could be expensive for complex non-tensor outputs. A comment noting this is expected to be lightweight values (e.g. None) would help.

CI

linux and unit-pr-required-check are failing — should be investigated before merge.

What's Good

Clean state machine design with skip/run/capture/original modes
Proper try/finally cleanup in both sequential_calibrate and _patch_all_layers
Output metadata approach (_extract_output_meta/_zeros_from_meta) correctly handles tuple/list/tensor outputs for skip layers
Excellent test coverage (~470 new test lines) covering mode transitions, heterogeneous layers, tuple unpacking, inter-layer norm, error paths, and cleanup
Extensible plugin-based decoder discovery via register_decoder_layer_support
Backward-compatible re-export of LayerActivationCollector from utils.py

cjluo-nv

Review Summary

The core refactoring from O(N²) to O(N) sequential calibration via skip/run/capture state machine is well-designed. The architecture, cleanup handling, and test coverage are all strong. However, there are a few issues that should be addressed before merge.

Critical: Missing model pattern registrations (regression)

The old get_decoder_layers() in network.py supported 5 model patterns. The PR only registers 2 (Nemotron-H and homogeneous HF). Missing support for:

Megatron/MCore: model.decoder.layers — no registration in plugins/megatron.py
GPT-style: model.transformer.h
Direct layers: model.layers (when it is an nn.ModuleList)
Nemotron Super/Nano without block_type: old code matched model.backbone.layers unconditionally; new get_nemotron_h_decoder_layers requires block_type on layers[0]

Models that previously worked with sequential calibration will now raise "Could not find transformer layers".

Medium Issues

Typo in public API: is_homogenous_hf_model should be is_homogeneous_hf_model (missing 'e'). Appears in function name, test name, and registration. Worth fixing now before it becomes a stable API.
forward_loop is None guard removed: The old code explicitly raised ValueError("forward_loop must not be None ..."). The PR removes this check (and the corresponding test test_seq_calib_raises_on_none_forward_loop) without adding an equivalent guard. A None forward_loop will now produce an unhelpful TypeError deep in the stack.
_decoder_layer_support as mutable class variable (activation_collector.py:83): This list is shared across all instances and grows monotonically. Tests work around this with monkeypatch, but in production, if plugins are re-imported or the module is reloaded, entries could accumulate. Consider documenting this as intentional or adding a guard.

Minor

License header on the new file says 2024 — should likely be 2025 for a new file.
copy.deepcopy(meta[1]) in _zeros_from_meta for the "other" case could be expensive for complex non-tensor outputs. A comment noting this is expected to be lightweight values (e.g. None) would help.

CI

linux and unit-pr-required-check are failing — should be investigated before merge.

What is Good

Clean state machine design with skip/run/capture/original modes
Proper try/finally cleanup in both sequential_calibrate and _patch_all_layers
Output metadata approach (_extract_output_meta/_zeros_from_meta) correctly handles tuple/list/tensor outputs for skip layers
Excellent test coverage (~470 new test lines) covering mode transitions, heterogeneous layers, tuple unpacking, inter-layer norm, error paths, and cleanup
Extensible plugin-based decoder discovery via register_decoder_layer_support
Backward-compatible re-export of LayerActivationCollector from utils.py

meenchen · 2026-03-11T06:25:27Z

modelopt/torch/quantization/plugins/huggingface.py

+LayerActivationCollector.register_decoder_layer_support(
+    is_nemotron_h_model, get_nemotron_h_decoder_layers
+)
+
+LayerActivationCollector.register_decoder_layer_support(
+    is_homogenous_hf_model, get_homogeneous_hf_decoder_layers
+)


This does not cover the pattern for Megatron models, which have model.decoder.layers. Maybe a follow up is to add similar support for mcore/

modelopt/torch/quantization/plugins/huggingface.py

meenchen · 2026-03-11T06:32:20Z

modelopt/torch/quantization/utils/activation_collector.py

+    output_meta: tuple | None = None
+
+
+class LayerActivationCollector:


qq, will this work in TP/EP/PP scenarios, which we usually use for mcore?

It might need additional sync logic to work for these scenarios.

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/quantization/plugins/huggingface.py (1)
1209-1216: Consider extending pattern coverage for other model architectures.

The current implementation only checks model.model.layers. As noted in a previous review, this doesn't cover patterns like model.decoder.layers (Megatron) or other architectures.

While not blocking, consider whether additional patterns should be checked here or in a follow-up PR to improve model coverage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/huggingface.py` around lines 1209 - 1216,
get_homogeneous_hf_decoder_layers currently only returns model.model.layers;
extend its checks to cover other common HF decoder patterns (e.g.,
model.decoder.layers for Megatron-style models, model.model.decoder.layers, and
model.transformer.layers) so more architectures are detected. Inside
get_homogeneous_hf_decoder_layers, add checks in order (e.g., hasattr(model,
"decoder") and hasattr(model.decoder, "layers"), then hasattr(model, "model")
and hasattr(model.model, "decoder") and hasattr(model.model.decoder, "layers"),
then hasattr(model, "transformer") and hasattr(model.transformer, "layers")) and
return the first matching nn.ModuleList; keep the early return None behavior if
none match. Ensure you reference this exact function name when implementing the
changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/plugins/huggingface.py`:
- Around line 1209-1216: get_homogeneous_hf_decoder_layers currently only
returns model.model.layers; extend its checks to cover other common HF decoder
patterns (e.g., model.decoder.layers for Megatron-style models,
model.model.decoder.layers, and model.transformer.layers) so more architectures
are detected. Inside get_homogeneous_hf_decoder_layers, add checks in order
(e.g., hasattr(model, "decoder") and hasattr(model.decoder, "layers"), then
hasattr(model, "model") and hasattr(model.model, "decoder") and
hasattr(model.model.decoder, "layers"), then hasattr(model, "transformer") and
hasattr(model.transformer, "layers")) and return the first matching
nn.ModuleList; keep the early return None behavior if none match. Ensure you
reference this exact function name when implementing the changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 221748c7-9696-4123-9104-58da5f79313e

📥 Commits

Reviewing files that changed from the base of the PR and between 50e31f0 and 5cf716a.

📒 Files selected for processing (6)

modelopt/torch/quantization/activation_collector.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/huggingface.py
modelopt/torch/quantization/utils.py
tests/unit/torch/quantization/plugins/test_huggingface.py
tests/unit/torch/quantization/test_sequential_calibrate.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/quantization/activation_collector.py
tests/unit/torch/quantization/plugins/test_huggingface.py

realAsma · 2026-03-11T22:52:34Z

modelopt/torch/quantization/utils/activation_collector.py

@sugunav14 can we make ‎modelopt/torch/quantization/utils a folder and move this file to the utils folder? We should preserve the backward compatibility (so we should do __init__ properly)

Okay, I can do that!

realAsma · 2026-03-11T22:55:28Z

modelopt/torch/quantization/utils/activation_collector.py

+    # ------------------------------------------------------------------
+    # Patched forward
+    # ------------------------------------------------------------------


obvious comment, can we remove it

Suggested change

# ------------------------------------------------------------------

# Patched forward

# ------------------------------------------------------------------

realAsma · 2026-03-11T23:14:43Z

modelopt/torch/quantization/utils/activation_collector.py

+            return tuple(LayerActivationCollector._zeros_from_meta(m) for m in meta[1])
+        if tag == "list":
+            return [LayerActivationCollector._zeros_from_meta(m) for m in meta[1]]
+        return copy.deepcopy(meta[1])


nit:

Suggested change

return copy.deepcopy(meta[1])

return meta[1]

realAsma · 2026-03-11T23:18:34Z

modelopt/torch/quantization/utils/activation_collector.py

+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _patched_forward(self, *args, **kwargs):


nit: we could define this in _patch_all_layers -

realAsma

Looks great!

realAsma · 2026-03-11T23:23:19Z

@sugunav14 have you tested an end to end flow with it?

As a follow up later, we can add a registry based plugin for saving the model so far (to enable resume during calibration)

sugunav14 · 2026-03-12T21:40:15Z

@sugunav14 have you tested an end to end flow with it?

As a follow up later, we can add a registry based plugin for saving the model so far (to enable resume during calibration)

Yeah, I did a sanity check on the e2e generation with max sequential calibrate on Qwen3-8B and Nemotron-Nano-v3. The generated outputs look okay. I was thinking of doing the accuracy checks along with the GPTQ PR!

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/torch/quantization/test_sequential_calibrate.py (1)
372-398: ⚠️ Potential issue | 🟠 Major

These tests lock in the wrong missing-inputs behavior.

One test treats an empty forward_loop as success, and the other codifies the late AssertionError after a layer reaches "run" with no cache. That makes it harder to move to the clearer fail-fast behavior when no replay inputs were captured.

Also applies to: 599-613
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py` around lines 372
- 398, The tests (notably test_seq_calib_empty_forward_loop) currently accept an
empty forward_loop as success which prevents moving to a fail-fast behavior;
update this and the related test (around the other occurrence) so they assert
that sequential_calibrate(...) raises an AssertionError when forward_loop
supplies no replay inputs. Concretely, change test_seq_calib_empty_forward_loop
and the test at the other location to call sequential_calibrate with
forward_loop=lambda m: None inside a pytest.raises(AssertionError) context (or
equivalent) and remove the existing logic that treats zero-count replays as
success; keep references to sequential_calibrate, calib_func, forward_loop, and
the test function names to locate the code.

♻️ Duplicate comments (2)

modelopt/torch/quantization/utils/activation_collector.py (2)

271-283: ⚠️ Potential issue | 🔴 Critical

Fail fast on invalid state transitions and empty capture results.

_set_layer_states() currently transitions into run/skip without verifying prerequisites, and get_input_activations() returns even when capture produced no inputs. This delays failure to later execution paths and makes debugging harder.

🔧 Proposed guardrails

diff --git a/modelopt/torch/quantization/utils/activation_collector.py b/modelopt/torch/quantization/utils/activation_collector.py
@@
         if layer_idx > 1:
             done = self._decoder_layers[layer_idx - 2]._seq_calib
+            if done.output_meta is None:
+                raise RuntimeError(
+                    f"Layer {layer_idx - 2} ({done.name}) cannot enter 'skip' without output metadata."
+                )
             done.mode = "skip"
@@
         if layer_idx > 0:
             prev = self._decoder_layers[layer_idx - 1]._seq_calib
+            if not prev.collected_inputs:
+                raise RuntimeError(
+                    f"Layer {layer_idx - 1} ({prev.name}) has no captured inputs to replay in 'run' mode."
+                )
             prev.mode = "run"
             prev.cached_inputs = deque(prev.collected_inputs)
             prev.collected_inputs = []
@@
         inputs = list(info.collected_inputs)
+        if not inputs:
+            raise RuntimeError(
+                f"Layer {layer_idx} ({info.name}) did not capture any inputs during forward_loop()."
+            )
         # After capture, set to original so calib_func can call the layer's

Also applies to: 335-340

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/activation_collector.py` around lines 271 -
283, The state transition in ActivationCollector._set_layer_states (the block
manipulating self._decoder_layers[...] ._seq_calib) must validate prerequisites
before flipping modes: check that the target _seq_calib objects exist and that
prev.collected_inputs is non-empty before setting prev.mode="run" and assigning
prev.cached_inputs = deque(prev.collected_inputs); if collected_inputs is empty
raise/return an explicit error instead of silently continuing. Likewise, when
setting done.mode="skip" ensure done has required output_meta and clear
cached_inputs only after verifying skip semantics. Also update
get_input_activations to explicitly raise or return an error when
collected_inputs is empty (avoid returning an empty activation), and apply the
same guards to the analogous block around lines 335-340 that manipulates
_seq_calib, prev, done, collected_inputs and cached_inputs.

52-52: ⚠️ Potential issue | 🟠 Major

Single output_meta is not batch-safe for skip replay.

run overwrites one metadata slot per batch, then skip reuses that one shape/type for all later batches. With variable batch shapes (e.g., last batch), dummy outputs can be wrong.

🔧 Proposed direction (per-batch metadata queue)

diff --git a/modelopt/torch/quantization/utils/activation_collector.py b/modelopt/torch/quantization/utils/activation_collector.py
@@
 `@dataclass`
 class _LayerCalibState:
@@
-    output_meta: tuple | None = None
+    output_meta_queue: deque = field(default_factory=deque)
@@
         if info.mode == "skip":
-            if info.output_meta is None:
+            if not info.output_meta_queue:
                 raise RuntimeError(
-                    f"Layer {info.name} is in 'skip' mode but has no output_meta. "
+                    f"Layer {info.name} is in 'skip' mode but has no output metadata queue. "
                     "This indicates a state-machine bug: the layer should have run "
                     "in 'run' mode (which sets output_meta) before transitioning to 'skip'."
                 )
-            return LayerActivationCollector._zeros_from_meta(info.output_meta)
+            meta = info.output_meta_queue[0]
+            # rotate to preserve per-batch ordering across repeated forward loops
+            info.output_meta_queue.rotate(-1)
+            return LayerActivationCollector._zeros_from_meta(meta)
@@
         if info.mode == "run":
@@
             output = self._original_forward(*real_args, **real_kwargs)
-            info.output_meta = LayerActivationCollector._extract_output_meta(output)
+            info.output_meta_queue.append(LayerActivationCollector._extract_output_meta(output))
             return output

Also applies to: 173-180, 188-188

🧹 Nitpick comments (3)

tests/unit/torch/quantization/plugins/test_huggingface.py (1)
210-228: Add a Nemotron-H regression alongside the generic HF path.

The production change adds a dedicated is_nemotron_h_model / get_nemotron_h_decoder_layers branch plus an ordering constraint, but these tests only exercise the generic model.model.layers path. A tiny fake model with backbone.layers[0].block_type would keep that new branch from regressing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 210 -
228, The tests currently only exercise the generic HF path and miss the new
Nemotron-H branch and ordering; add a regression test alongside the existing HF
tests that constructs a tiny fake model exposing backbone.layers[0].block_type
(or use the provided tiny fake model variant) and asserts that
LayerActivationCollector._decoder_layer_support contains the
(is_nemotron_h_model, get_nemotron_h_decoder_layers) entry in the expected order
relative to (is_homogeneous_hf_model, get_homogeneous_hf_decoder_layers), and
verify LayerActivationCollector.get_decoder_layers(model) returns
get_nemotron_h_decoder_layers(model) for that model to prevent regressions.
modelopt/torch/quantization/utils/__init__.py (1)
18-34: Use explicit re-exports here.

from .core_utils import * now pulls every non-private symbol from core_utils.py into the package namespace, but __all__ still omits LayerActivationCollector. That leaves the public surface both wider and less consistent than intended; please switch this file to explicit imports (or restore a constrained core_utils.__all__) and add LayerActivationCollector to __all__.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/__init__.py` around lines 18 - 34, Replace
the wildcard import from core_utils with explicit imports for the symbols listed
in __all__ (EXPORT_MODE, convert_quantization_axis_to_reduce_axis,
export_torch_mode, is_quantized, is_quantized_column_parallel_linear,
is_quantized_linear, is_quantized_row_parallel_linear, reduce_amax, reduce_sum,
replace_function, update_quant_cfg_with_kv_cache_quant, weight_attr_names) by
importing them directly from .core_utils, import LayerActivationCollector from
.activation_collector, and add "LayerActivationCollector" to the __all__ list so
the package public surface is explicit and consistent.
tests/unit/torch/quantization/test_sequential_calibrate.py (1)
696-745: This doesn't verify what layer 1 actually replayed.

The final assertion only proves that layer 0's weights were doubled by the end of calibration. The test would still pass if replay into layer 1 used stale activations, so please capture layer 1's actual inputs during weight_doubling_calib and assert those values directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py` around lines 696
- 745, The test currently only asserts layer 0's final weights changed; instead
capture what was actually replayed into layer 1 during calibration and assert it
equals the doubled activations. Add a closure list (e.g. captured_inputs = [])
and modify weight_doubling_calib to, when the `layer` is `model.layers[1]` (or
compare identity), register a temporary forward hook on that layer that appends
input[0].detach().clone() to captured_inputs, perform the existing weight
doubling and call layer_forward_loop(layer), then remove the hook; after
sequential_calibrate assert captured_inputs is non-empty and
torch.allclose(captured_inputs[0], activations_before_weight_update * 2.0). This
uses the existing symbols test_run_layer_reflects_weight_updates,
weight_doubling_calib, sequential_calibrate, model.layers[0], and
model.layers[1].

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1864-1872: The loop calls calib_func even when
input_getter.get_input_activations(...) returns an empty list, causing silent
no-op calibration; before constructing _layer_forward_loop and calling
calib_func for each layer (symbols: transformer_layers,
input_getter.get_input_activations, forward_loop, layer_inputs,
_layer_forward_loop, calib_func), check whether layer_inputs is empty and if so
reject it (either raise an exception or skip with a clear error/warning and do
not call calib_func) so layers with no captured batches are not treated as
successfully calibrated.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py`:
- Around line 183-185: Wrap the sequence that patches/unpatches in a try/finally
so the model is always unpatched if an exception occurs: call
collector._patch_all_layers() before the try, invoke
collector.get_input_activations(model.layers[0], forward_loop) (and any other
test calls) inside the try, and ensure collector._unpatch_all_layers() is called
in the finally block; apply the same change for the other occurrence that uses
the same three calls to avoid leaving the model patched.

---

Outside diff comments:
In `@tests/unit/torch/quantization/test_sequential_calibrate.py`:
- Around line 372-398: The tests (notably test_seq_calib_empty_forward_loop)
currently accept an empty forward_loop as success which prevents moving to a
fail-fast behavior; update this and the related test (around the other
occurrence) so they assert that sequential_calibrate(...) raises an
AssertionError when forward_loop supplies no replay inputs. Concretely, change
test_seq_calib_empty_forward_loop and the test at the other location to call
sequential_calibrate with forward_loop=lambda m: None inside a
pytest.raises(AssertionError) context (or equivalent) and remove the existing
logic that treats zero-count replays as success; keep references to
sequential_calibrate, calib_func, forward_loop, and the test function names to
locate the code.

---

Duplicate comments:
In `@modelopt/torch/quantization/utils/activation_collector.py`:
- Around line 271-283: The state transition in
ActivationCollector._set_layer_states (the block manipulating
self._decoder_layers[...] ._seq_calib) must validate prerequisites before
flipping modes: check that the target _seq_calib objects exist and that
prev.collected_inputs is non-empty before setting prev.mode="run" and assigning
prev.cached_inputs = deque(prev.collected_inputs); if collected_inputs is empty
raise/return an explicit error instead of silently continuing. Likewise, when
setting done.mode="skip" ensure done has required output_meta and clear
cached_inputs only after verifying skip semantics. Also update
get_input_activations to explicitly raise or return an error when
collected_inputs is empty (avoid returning an empty activation), and apply the
same guards to the analogous block around lines 335-340 that manipulates
_seq_calib, prev, done, collected_inputs and cached_inputs.

---

Nitpick comments:
In `@modelopt/torch/quantization/utils/__init__.py`:
- Around line 18-34: Replace the wildcard import from core_utils with explicit
imports for the symbols listed in __all__ (EXPORT_MODE,
convert_quantization_axis_to_reduce_axis, export_torch_mode, is_quantized,
is_quantized_column_parallel_linear, is_quantized_linear,
is_quantized_row_parallel_linear, reduce_amax, reduce_sum, replace_function,
update_quant_cfg_with_kv_cache_quant, weight_attr_names) by importing them
directly from .core_utils, import LayerActivationCollector from
.activation_collector, and add "LayerActivationCollector" to the __all__ list so
the package public surface is explicit and consistent.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 210-228: The tests currently only exercise the generic HF path and
miss the new Nemotron-H branch and ordering; add a regression test alongside the
existing HF tests that constructs a tiny fake model exposing
backbone.layers[0].block_type (or use the provided tiny fake model variant) and
asserts that LayerActivationCollector._decoder_layer_support contains the
(is_nemotron_h_model, get_nemotron_h_decoder_layers) entry in the expected order
relative to (is_homogeneous_hf_model, get_homogeneous_hf_decoder_layers), and
verify LayerActivationCollector.get_decoder_layers(model) returns
get_nemotron_h_decoder_layers(model) for that model to prevent regressions.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py`:
- Around line 696-745: The test currently only asserts layer 0's final weights
changed; instead capture what was actually replayed into layer 1 during
calibration and assert it equals the doubled activations. Add a closure list
(e.g. captured_inputs = []) and modify weight_doubling_calib to, when the
`layer` is `model.layers[1]` (or compare identity), register a temporary forward
hook on that layer that appends input[0].detach().clone() to captured_inputs,
perform the existing weight doubling and call layer_forward_loop(layer), then
remove the hook; after sequential_calibrate assert captured_inputs is non-empty
and torch.allclose(captured_inputs[0], activations_before_weight_update * 2.0).
This uses the existing symbols test_run_layer_reflects_weight_updates,
weight_doubling_calib, sequential_calibrate, model.layers[0], and
model.layers[1].

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d6398879-2fd1-4cd7-98e8-2a6fc14aacbf

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf716a and 604f2e1.

📒 Files selected for processing (9)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/huggingface.py
modelopt/torch/quantization/utils/__init__.py
modelopt/torch/quantization/utils/activation_collector.py
modelopt/torch/quantization/utils/core_utils.py
tests/unit/torch/quantization/plugins/test_huggingface.py
tests/unit/torch/quantization/test_calib.py
tests/unit/torch/quantization/test_sequential_calibrate.py
tests/unit/torch/quantization/test_utils.py

coderabbitai · 2026-03-12T23:47:10Z

modelopt/torch/quantization/model_calib.py

+        for layer_idx, layer in enumerate(transformer_layers):
+            print_rank_0(f"Calibrating layer {layer_idx}")
+            layer_inputs = input_getter.get_input_activations(layer, forward_loop)

-        # Define a forward loop for the current layer
-        def _layer_forward_loop(m, _inputs=layer_inputs):
-            for args, kwargs_input in _inputs:
-                m(*args, **kwargs_input)
+            def _layer_forward_loop(m, _inputs=layer_inputs):
+                for args, kwargs_input in _inputs:
+                    m(*args, **kwargs_input)

-        # Call calibration function
-        calib_func(layer, _layer_forward_loop, **calib_kwargs)
-        del layer_inputs
-        torch.cuda.empty_cache()
+            calib_func(layer, _layer_forward_loop, **calib_kwargs)


⚠️ Potential issue | 🟠 Major

Reject empty captures before invoking calib_func.

If forward_loop yields no calibration batches, layer_inputs is empty and we silently run calib_func with a no-op replay loop. That makes sequential calibration look successful while leaving the layer uncalibrated.

Suggested guard

for layer_idx, layer in enumerate(transformer_layers): print_rank_0(f"Calibrating layer {layer_idx}") layer_inputs = input_getter.get_input_activations(layer, forward_loop) + if not layer_inputs: + raise RuntimeError( + f"No calibration inputs were captured for layer {layer_idx}. " + "Please ensure `forward_loop` executes at least one forward pass." + ) def _layer_forward_loop(m, _inputs=layer_inputs): for args, kwargs_input in _inputs: m(*args, **kwargs_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 1864 - 1872, The loop calls calib_func even when input_getter.get_input_activations(...) returns an empty list, causing silent no-op calibration; before constructing _layer_forward_loop and calling calib_func for each layer (symbols: transformer_layers, input_getter.get_input_activations, forward_loop, layer_inputs, _layer_forward_loop, calib_func), check whether layer_inputs is empty and if so reject it (either raise an exception or skip with a clear error/warning and do not call calib_func) so layers with no captured batches are not treated as successfully calibrated.

tests/unit/torch/quantization/test_sequential_calibrate.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

modelopt/torch/quantization/model_calib.py (1)

1862-1870: ⚠️ Potential issue | 🟠 Major

Reject empty captures before invoking calib_func.

If forward_loop produces no batches, layer_inputs is empty and this path silently treats the layer as calibrated while replaying nothing. Raise here instead of constructing a no-op _layer_forward_loop. test_seq_calib_empty_forward_loop() should then flip to expecting that failure.

Suggested guard

         for layer_idx, layer in enumerate(transformer_layers):
             print_rank_0(f"Calibrating layer {layer_idx}")
             layer_inputs = input_getter.get_input_activations(layer, forward_loop)
+            if not layer_inputs:
+                raise RuntimeError(
+                    f"No calibration inputs were captured for layer {layer_idx}. "
+                    "Please ensure `forward_loop` executes at least one forward pass."
+                )
 
             def _layer_forward_loop(m, _inputs=layer_inputs):
                 for args, kwargs_input in _inputs:
                     m(*args, **kwargs_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1862 - 1870, The
loop over transformer_layers currently proceeds even if
input_getter.get_input_activations(transformer_layer, forward_loop) returns an
empty list, causing calib_func to be called with a no-op _layer_forward_loop;
change this by checking layer_inputs after calling
input_getter.get_input_activations and raise a clear exception (e.g.,
ValueError) when it's empty (reference symbols: transformer_layers,
input_getter.get_input_activations, layer_inputs, _layer_forward_loop,
calib_func, forward_loop) so the caller/test (test_seq_calib_empty_forward_loop)
can expect/fail accordingly.

tests/unit/torch/quantization/test_sequential_calibrate.py (1)

183-185: ⚠️ Potential issue | 🟡 Minor

Always unpatch in these two tests.

These paths still call _patch_all_layers() / _unpatch_all_layers() without a try/finally, so an unexpected failure in get_input_activations() leaves patched forwards behind and can cascade into later cases.

Also applies to: 626-630
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py` around lines 183
- 185, The tests call collector._patch_all_layers() then
collector.get_input_activations(...) and finally collector._unpatch_all_layers()
but lack a try/finally so failures leave patches active; update the test(s) that
use _patch_all_layers/_unpatch_all_layers (including the block invoking
get_input_activations on model.layers[0] and the duplicated block around lines
626-630) to wrap the call to get_input_activations inside a try/finally where
collector._unpatch_all_layers() is invoked in the finally clause to guarantee
cleanup even on exceptions.

modelopt/torch/quantization/utils/activation_collector.py (2)

50-52: ⚠️ Potential issue | 🟠 Major

output_meta needs to be tracked per replayed batch.

run overwrites a single output_meta, and skip reuses that same structure for every later batch. With variable sequence lengths or a short last batch, skipped layers will synthesize dummy outputs with the wrong shape/structure on subsequent passes. Keep per-batch metadata aligned with the captured inputs and replay it in order instead of storing one shared slot per layer.

Also applies to: 173-180, 182-189

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/activation_collector.py` around lines 50 -
52, The collector currently stores a single output_meta that is overwritten and
reused across batches; change output_meta from a single slot to a per-batch
structure (e.g., a list/deque) aligned with cached_inputs/collected_inputs so
each captured input has its own output metadata; update the ActivationCollector
fields (cached_inputs, collected_inputs, output_meta) and modify the run and
skip methods to append/read the corresponding per-batch output_meta entry in the
same order as inputs (use the same indexing/pop order as cached_inputs) so
replayed batches use the exact metadata for that batch instead of a shared slot.

271-282: ⚠️ Potential issue | 🟠 Major

Validate state transitions before flipping modes.

If a layer never captured inputs or never produced replay metadata, _set_layer_states() still moves it into run/skip and the failure only appears later inside _patched_forward after some state has already been mutated. Reject empty prev.collected_inputs and missing done.output_meta here so the error points at the broken transition. test_mode_transitions_across_calibration_steps() will need to seed realistic capture state once this guard exists.

Suggested guardrails

         if layer_idx > 1:
             done = self._decoder_layers[layer_idx - 2]._seq_calib
+            if done.output_meta is None:
+                raise RuntimeError(
+                    f"Layer {done.name} cannot enter 'skip' before a successful replay."
+                )
             done.mode = "skip"
             # output_meta is intentionally kept: skip mode needs it to produce
             # correctly shaped zero-filled outputs for the parent forward.
             done.cached_inputs.clear()
 
         if layer_idx > 0:
             prev = self._decoder_layers[layer_idx - 1]._seq_calib
+            if not prev.collected_inputs:
+                raise RuntimeError(
+                    f"Layer {prev.name} has no captured inputs to replay."
+                )
             prev.mode = "run"
             prev.cached_inputs = deque(prev.collected_inputs)
             prev.collected_inputs = []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/activation_collector.py` around lines 271 -
282, In _set_layer_states (the block that flips _seq_calib.mode for
self._decoder_layers), validate state before mutating: when setting done =
self._decoder_layers[layer_idx - 2]._seq_calib ensure done.output_meta is
present (raise a clear error/ValueError if missing) and when preparing prev =
self._decoder_layers[layer_idx - 1]._seq_calib ensure prev.collected_inputs is
non-empty (raise a clear error/ValueError if empty) instead of proceeding to set
modes and swap cached_inputs; this makes failures point to the bad transition
rather than surfacing later in _patched_forward.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/utils/__init__.py`:
- Around line 18-34: Add explicit imports for every name listed in __all__
(e.g., EXPORT_MODE, convert_quantization_axis_to_reduce_axis, export_torch_mode,
is_quantized, is_quantized_column_parallel_linear, is_quantized_linear,
is_quantized_row_parallel_linear, reduce_amax, reduce_sum, replace_function,
update_quant_cfg_with_kv_cache_quant, weight_attr_names) at the top of
modelopt/torch/quantization/utils/__init__.py (you may keep the existing from
.core_utils import * for runtime surface), and include LayerActivationCollector
in the __all__ list so mypy and Ruff can resolve the exports; ensure the
explicit imports reference the correct module paths (e.g., from
.activation_collector import LayerActivationCollector) and update the __all__
array to contain all those symbol names.

---

Duplicate comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1862-1870: The loop over transformer_layers currently proceeds
even if input_getter.get_input_activations(transformer_layer, forward_loop)
returns an empty list, causing calib_func to be called with a no-op
_layer_forward_loop; change this by checking layer_inputs after calling
input_getter.get_input_activations and raise a clear exception (e.g.,
ValueError) when it's empty (reference symbols: transformer_layers,
input_getter.get_input_activations, layer_inputs, _layer_forward_loop,
calib_func, forward_loop) so the caller/test (test_seq_calib_empty_forward_loop)
can expect/fail accordingly.

In `@modelopt/torch/quantization/utils/activation_collector.py`:
- Around line 50-52: The collector currently stores a single output_meta that is
overwritten and reused across batches; change output_meta from a single slot to
a per-batch structure (e.g., a list/deque) aligned with
cached_inputs/collected_inputs so each captured input has its own output
metadata; update the ActivationCollector fields (cached_inputs,
collected_inputs, output_meta) and modify the run and skip methods to
append/read the corresponding per-batch output_meta entry in the same order as
inputs (use the same indexing/pop order as cached_inputs) so replayed batches
use the exact metadata for that batch instead of a shared slot.
- Around line 271-282: In _set_layer_states (the block that flips
_seq_calib.mode for self._decoder_layers), validate state before mutating: when
setting done = self._decoder_layers[layer_idx - 2]._seq_calib ensure
done.output_meta is present (raise a clear error/ValueError if missing) and when
preparing prev = self._decoder_layers[layer_idx - 1]._seq_calib ensure
prev.collected_inputs is non-empty (raise a clear error/ValueError if empty)
instead of proceeding to set modes and swap cached_inputs; this makes failures
point to the bad transition rather than surfacing later in _patched_forward.

In `@tests/unit/torch/quantization/test_sequential_calibrate.py`:
- Around line 183-185: The tests call collector._patch_all_layers() then
collector.get_input_activations(...) and finally collector._unpatch_all_layers()
but lack a try/finally so failures leave patches active; update the test(s) that
use _patch_all_layers/_unpatch_all_layers (including the block invoking
get_input_activations on model.layers[0] and the duplicated block around lines
626-630) to wrap the call to get_input_activations inside a try/finally where
collector._unpatch_all_layers() is invoked in the finally clause to guarantee
cleanup even on exceptions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6423b2c7-8f29-433a-85a5-59b87dfcb39f

📥 Commits

Reviewing files that changed from the base of the PR and between 604f2e1 and c18d109.

📒 Files selected for processing (10)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/huggingface.py
modelopt/torch/quantization/utils/__init__.py
modelopt/torch/quantization/utils/activation_collector.py
modelopt/torch/quantization/utils/core_utils.py
modelopt/torch/utils/network.py
tests/unit/torch/quantization/plugins/test_huggingface.py
tests/unit/torch/quantization/test_calib.py
tests/unit/torch/quantization/test_sequential_calibrate.py
tests/unit/torch/quantization/test_utils.py

💤 Files with no reviewable changes (1)

modelopt/torch/utils/network.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/torch/quantization/test_calib.py

coderabbitai · 2026-03-12T23:59:03Z

modelopt/torch/quantization/utils/__init__.py

+from .core_utils import *  # noqa: F401,F403
+from .activation_collector import LayerActivationCollector  # noqa: F401
+
+__all__ = [
+    "EXPORT_MODE",
+    "convert_quantization_axis_to_reduce_axis",
+    "export_torch_mode",
+    "is_quantized",
+    "is_quantized_column_parallel_linear",
+    "is_quantized_linear",
+    "is_quantized_row_parallel_linear",
+    "reduce_amax",
+    "reduce_sum",
+    "replace_function",
+    "update_quant_cfg_with_kv_cache_quant",
+    "weight_attr_names",
+]


⚠️ Potential issue | 🟠 Major

Make the package exports explicit here.

This hunk is the source of both CI failures: Ruff cannot resolve the names referenced from __all__ through the star import, and mypy does not see LayerActivationCollector as exported because it is missing from __all__. Keep the star import if you still need the broader runtime surface, but explicitly import the symbols referenced in __all__ and add LayerActivationCollector there.

One minimal way to satisfy both tools without shrinking the current runtime surface

from .core_utils import * # noqa: F401,F403 -from .activation_collector import LayerActivationCollector # noqa: F401 +from .core_utils import ( + EXPORT_MODE, + convert_quantization_axis_to_reduce_axis, + export_torch_mode, + is_quantized, + is_quantized_column_parallel_linear, + is_quantized_linear, + is_quantized_row_parallel_linear, + reduce_amax, + reduce_sum, + replace_function, + update_quant_cfg_with_kv_cache_quant, + weight_attr_names, +) +from .activation_collector import LayerActivationCollector __all__ = [ "EXPORT_MODE", "convert_quantization_axis_to_reduce_axis", "export_torch_mode", @@ "replace_function", "update_quant_cfg_with_kv_cache_quant", "weight_attr_names", + "LayerActivationCollector", ]

As per coding guidelines, **/*.py: Use ruff linter for Python code (configured in pyproject.toml) and Use mypy for type checking on Python code (configured in pyproject.toml).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/utils/__init__.py` around lines 18 - 34, Add explicit imports for every name listed in __all__ (e.g., EXPORT_MODE, convert_quantization_axis_to_reduce_axis, export_torch_mode, is_quantized, is_quantized_column_parallel_linear, is_quantized_linear, is_quantized_row_parallel_linear, reduce_amax, reduce_sum, replace_function, update_quant_cfg_with_kv_cache_quant, weight_attr_names) at the top of modelopt/torch/quantization/utils/__init__.py (you may keep the existing from .core_utils import * for runtime surface), and include LayerActivationCollector in the __all__ list so mypy and Ruff can resolve the exports; ensure the explicit imports reference the correct module paths (e.g., from .activation_collector import LayerActivationCollector) and update the __all__ array to contain all those symbol names.

sugunav14 force-pushed the svelury/sequential-calibrate-refactor branch from 13b9033 to 7f72422 Compare March 5, 2026 21:30

cjluo-nv requested a review from Copilot March 8, 2026 17:51

Copilot started reviewing on behalf of cjluo-nv March 8, 2026 17:51 View session

cjluo-nv reviewed Mar 8, 2026

View reviewed changes

Copilot AI reviewed Mar 8, 2026

View reviewed changes

modelopt-bot reviewed Mar 8, 2026

View reviewed changes

modelopt/torch/quantization/model_calib.py Outdated Show resolved Hide resolved

modelopt-bot reviewed Mar 8, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

modelopt-bot reviewed Mar 8, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

modelopt-bot reviewed Mar 8, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

modelopt-bot reviewed Mar 8, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

Fridah-nv reviewed Mar 9, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

Fridah-nv reviewed Mar 9, 2026

View reviewed changes

modelopt/torch/quantization/utils.py Outdated Show resolved Hide resolved

sugunav14 requested review from meenchen, realAsma and shengliangxu March 10, 2026 20:04

sugunav14 marked this pull request as ready for review March 11, 2026 04:34

sugunav14 requested review from a team as code owners March 11, 2026 04:34

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

cjluo-nv requested changes Mar 11, 2026

View reviewed changes

meenchen reviewed Mar 11, 2026

View reviewed changes

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

realAsma reviewed Mar 11, 2026

View reviewed changes

realAsma approved these changes Mar 11, 2026

View reviewed changes

sugunav14 and others added 12 commits March 12, 2026 23:38

sequential flow

1a4d650

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

clean up

99b5134

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Modular/Plugin based sequential calib

97a6d4b

Signed-off-by: realAsma <akuriparambi@nvidia.com>

update

437a4ae

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

initial e2e tested sequential calibrate refactor

0cba938

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added meta data caching

9e989c4

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added logging and unit tests

0df2540

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed stray changes

d4c07a7

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

AI PR comments addressed

537902b

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

moved LayerActivationCollector to new file

ef44d05

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Claude review addressed

54b2a6a

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

PR review comments addressed

c18d109

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 force-pushed the svelury/sequential-calibrate-refactor branch from 604f2e1 to c18d109 Compare March 12, 2026 23:40

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

-        """
+        """
+        # Ensure layers have been patched before attempting to collect activations.
+        if not hasattr(self, "_layer_to_idx") or layer not in self._layer_to_idx:
+            raise RuntimeError(
+                "LayerActivationCollector.get_input_activations() was called before layers were patched. "
+                "Make sure to call LayerActivationCollector._patch_all_layers() (or the appropriate "
+                "patching method) before requesting input activations."
+            )

		output_meta: tuple \| None = None


		class LayerActivationCollector:

	# ------------------------------------------------------------------
	# Patched forward
	# ------------------------------------------------------------------

Conversation

sugunav14 commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

modelopt-bot commented Mar 8, 2026

Logic Correctness Review

🔴 Critical Issues

1. Race condition in state transitions for layer 0 and 1

2. Missing validation: What if collected_inputs is empty when transitioning to "run"?

3. Memory leak: output_meta never cleared for skip layers

🟡 Medium Issues

4. Incorrect error message in sequential_calibrate

5. _patched flag not reset on exception

6. Thread safety: Class-level _decoder_layer_support is mutable

🟢 Minor Issues

7. Docstring typo

8. Inconsistent handling of non-tensor outputs

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Review Summary

Critical: Missing model pattern registrations (regression)

Medium Issues

sugunav14 commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

2. Missing validation: What if `collected_inputs` is empty when transitioning to "run"?

3. Memory leak: `output_meta` never cleared for skip layers

4. Incorrect error message in `sequential_calibrate`

5. `_patched` flag not reset on exception

6. Thread safety: Class-level `_decoder_layer_support` is mutable