Skip to content

CHANGELOG 0.10.0: clarify sm100f fwd/bwd limits and CUEQ_TORCH_COMPILE modes#276

Open
hsadasivan wants to merge 2 commits into
mainfrom
changelog-sm100f-torch-compile
Open

CHANGELOG 0.10.0: clarify sm100f fwd/bwd limits and CUEQ_TORCH_COMPILE modes#276
hsadasivan wants to merge 2 commits into
mainfrom
changelog-sm100f-torch-compile

Conversation

@hsadasivan
Copy link
Copy Markdown
Collaborator

Summary

  • Clarifies that the new sm100f (CC 10.0/10.3) improvement is specifically the forward kernel (hidden_dim ≤ 256), while the backward acceleration limit (hidden_dim ≤ 128) remains unchanged from before
  • Expands the CUEQ_TORCH_COMPILE note to document that it applies to the forward pass only, and lists the supported integer modes: 1"default", 2"reduce-overhead", 3"max-autotune", 4"max-autotune-no-cudagraphs"

Test plan

  • CHANGELOG-only change; no functional code modified

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hsadasivan hsadasivan requested a review from phiandark April 22, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants