Skip to content

Add challenge 97: Softcap Attention (Medium)#254

Open
claude[bot] wants to merge 2 commits into
mainfrom
add-challenge-97-softcap-attention
Open

Add challenge 97: Softcap Attention (Medium)#254
claude[bot] wants to merge 2 commits into
mainfrom
add-challenge-97-softcap-attention

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Apr 24, 2026

Summary

  • Adds challenge 97: Softcap Attention (Medium) — multi-head self-attention with tanh logit soft-capping applied to pre-softmax scores
  • Models a real inference kernel used in production LLMs such as Gemma 2 (softcap=50) and Grok-1.5; complements existing attention variants (MHA, causal, ALiBi, GQA, sliding-window, linear) without duplicating any merged challenge or pending PR
  • Solver must fuse softcap * tanh(scores / softcap) into the scaled dot-product attention pipeline before the row-wise softmax

Test plan

  • pre-commit run --all-files passes on all new files
  • challenge.py imports cleanly; generate_functional_test() returns 10 cases (edge 1–4, zero input, mixed negatives, power-of-2, non-power-of-2, realistic)
  • generate_performance_test fits comfortably within 16 GB VRAM (N=2048, d_model=1024, h=16 → ~32 MB of tensors; kernel uses ~9 KB shared memory per block)
  • Validated end-to-end on live platform via scripts/run_challenge.py ... --action submit with a T4 CUDA solution: All tests passed (sample + functional + performance)
  • Starter files for all six frameworks compile/run but do not produce correct output

🤖 Generated with Claude Code

Implement multi-head self-attention with tanh logit soft-capping
(as used in Gemma 2 and other modern LLMs). Soft-capping applies
softcap * tanh(scores / softcap) to the pre-softmax attention
scores to bound their magnitude, a real-world inference kernel
not covered by existing attention challenges.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants