Skip to content

Extend regular NAX tuning to gen-17 g devices#3295

Open
lentil32 wants to merge 1 commit intoml-explore:mainfrom
lentil32:fix-3196-m5-regular-nax-routing
Open

Extend regular NAX tuning to gen-17 g devices#3295
lentil32 wants to merge 1 commit intoml-explore:mainfrom
lentil32:fix-3196-m5-regular-nax-routing

Conversation

@lentil32
Copy link
Copy Markdown

@lentil32 lentil32 commented Mar 22, 2026

Proposed changes

This may address #3196 on M5 devices that route through the g regular NAX path.

Benchmarks

Measured manually on an Apple M5 Pro with the issue-shaped BF16 1280x1280 addmm / matmul microbenchmark (30 warmup iterations, 1000 timed iterations), using explicit BF16 flags.

This machine reports architecture = applegpu_g17s, so the real-device path is effectively unchanged because it already takes the existing tuned route:

  • addmm: 0.3073 ms -> 0.3059 ms (-0.5%)
  • matmul: 0.3108 ms -> 0.3099 ms (-0.3%)

To validate the new g route directly, I reran the same workload with MLX_METAL_GPU_ARCH=applegpu_g17g:

  • addmm: 0.3620 ms -> 0.3132 ms (-13.5%, 1.16x faster)
  • matmul: 0.3495 ms -> 0.3208 ms (-8.2%, 1.09x faster)

These results suggest the change improves the targeted g path, but I have not yet validated it on a real device that reports applegpu_g17g.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@lentil32 lentil32 force-pushed the fix-3196-m5-regular-nax-routing branch from 521be5c to ebc1ab4 Compare March 22, 2026 12:36
@zcbenz zcbenz requested a review from jagrit06 March 22, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant