Add challenge 101: Adaptive Layer Normalization (Medium) by claude[bot] · Pull Request #267 · AlphaGPU/leetgpu-challenges

claude · 2026-05-15T06:15:37Z

Summary

Adds challenge 101 — Adaptive Layer Normalization (AdaLN), the conditioning kernel used in modern Diffusion Transformers (DiT, Stable Diffusion 3, FLUX, OpenSora, SiT). For each (b, n) token, the solver layer-normalizes along the feature dimension and then modulates by a per-batch scale and shift vector that is broadcast across the sequence:

out[b, n, d] = LN(X[b, n, :])[d] * (1 + scale[b, d]) + shift[b, d]

The per-batch (rather than per-channel) modulation is what distinguishes AdaLN from BatchNorm/RMSNorm and is the core mechanism by which time-step / class embeddings condition diffusion transformers.

Inputs: X (B, N, D), scale (B, D), shift (B, D)
Output: (B, N, D), fixed eps = 1e-5, float32 throughout
Performance test: B = 16, N = 4,096, D = 1,152 (DiT-XL/2 inspired, ≈ 600 MB working set)

Validated end-to-end against the live platform (T4) with a hand-written CUDA reference solution — all functional + performance tests pass. Reference impl matches torch.nn.functional.layer_norm to within float32 precision on the example.

Test plan

pre-commit run --all-files passes
python scripts/run_challenge.py challenges/medium/101_adaptive_layer_normalization --language cuda --action run → example test passes
python scripts/run_challenge.py challenges/medium/101_adaptive_layer_normalization --language cuda --action submit → all functional + performance tests pass on T4
Reference impl produces values matching F.layer_norm + per-batch modulation
All 6 starter files present, each with a single parameter description comment and an empty solve
HTML example values match generate_example_test() output
Performance test working set (~600 MB) fits 5× in 16 GB Tesla T4 VRAM
Number 101 is unclaimed by any open PR (highest existing/PR number is 100)

🤖 Generated with Claude Code

Implements the AdaLN modulation kernel that conditions diffusion transformers (DiT, Stable Diffusion 3, FLUX, OpenSora, SiT). Each token is layer-normalized along the feature dimension and then modulated by a per-batch scale and shift vector that is broadcast across the sequence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

claude Bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners May 15, 2026 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 101: Adaptive Layer Normalization (Medium)#267

Add challenge 101: Adaptive Layer Normalization (Medium)#267
claude[bot] wants to merge 1 commit into
mainfrom
add-challenge-101-adaptive-layer-normalization

claude Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude Bot commented May 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants