Conversation
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1019 +/- ##
==========================================
- Coverage 70.25% 70.22% -0.04%
==========================================
Files 220 220
Lines 25368 25391 +23
==========================================
+ Hits 17822 17830 +8
- Misses 7546 7561 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Adds an apply_sparse24: bool config option to the existing flash_skip_softmax method. When enabled, a 2:4 structured sparsity mask (top-2 of every 4 elements along seq_k) is AND-ed with the skip-softmax block mask in
both prefill and decode phases.
This is a pure PyTorch-level feature for research and analysis — not a performance optimization. It allows studying the interaction between block-level and 2:4 structured sparsity patterns.
Changes