Added 2:4 sparsity to skip softmax method by rohansjoshi · Pull Request #1019 · NVIDIA/Model-Optimizer

rohansjoshi · 2026-03-11T02:59:13Z

Summary

Adds an apply_sparse24: bool config option to the existing flash_skip_softmax method. When enabled, a 2:4 structured sparsity mask (top-2 of every 4 elements along seq_k) is AND-ed with the skip-softmax block mask in
both prefill and decode phases.

This is a pure PyTorch-level feature for research and analysis — not a performance optimization. It allows studying the interaction between block-level and 2:4 structured sparsity patterns.

Changes

config.py — New apply_sparse24 field on SparseAttentionAttributeConfig; new SPARSE24_SKIP_SOFTMAX and SPARSE24_SKIP_SOFTMAX_CALIB preset configs.
methods/flash_skip_softmax.py — Reads the flag and applies the 2:4 mask inside calc_correction_factor_and_p.
hf_sa.py — Exposes --sparse_attn sparse24_skip_softmax and --sparse_attn sparse24_skip_softmax_calib as CLI choices.

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

copy-pr-bot · 2026-03-11T02:59:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-11T02:59:19Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ec983e1-1088-4af5-861d-c1574a8bdfe1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rohjoshi/sparse24-plus-skipsoftmax

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-11T03:11:18Z

Codecov Report

❌ Patch coverage is 34.78261% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.22%. Comparing base (fe83270) to head (6afa360).

Files with missing lines	Patch %	Lines
...y/attention_sparsity/methods/flash_skip_softmax.py	21.05%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1019      +/-   ##
==========================================
- Coverage   70.25%   70.22%   -0.04%     
==========================================
  Files         220      220              
  Lines       25368    25391      +23     
==========================================
+ Hits        17822    17830       +8     
- Misses       7546     7561      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Added 2:4 sparsity to skip softmax method

6afa360

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 2:4 sparsity to skip softmax method#1019

Added 2:4 sparsity to skip softmax method#1019
rohansjoshi wants to merge 1 commit intomainfrom
rohjoshi/sparse24-plus-skipsoftmax

rohansjoshi commented Mar 11, 2026

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026

Review skipped

Uh oh!

codecov bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rohansjoshi commented Mar 11, 2026

Summary

Changes

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026

Review skipped

Uh oh!

codecov bot commented Mar 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant