Skip to content

Add challenge 101: Adaptive Layer Normalization (Medium)#267

Open
claude[bot] wants to merge 1 commit into
mainfrom
add-challenge-101-adaptive-layer-normalization
Open

Add challenge 101: Adaptive Layer Normalization (Medium)#267
claude[bot] wants to merge 1 commit into
mainfrom
add-challenge-101-adaptive-layer-normalization

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented May 15, 2026

Summary

Adds challenge 101 — Adaptive Layer Normalization (AdaLN), the conditioning kernel used in modern Diffusion Transformers (DiT, Stable Diffusion 3, FLUX, OpenSora, SiT). For each (b, n) token, the solver layer-normalizes along the feature dimension and then modulates by a per-batch scale and shift vector that is broadcast across the sequence:

out[b, n, d] = LN(X[b, n, :])[d] * (1 + scale[b, d]) + shift[b, d]

The per-batch (rather than per-channel) modulation is what distinguishes AdaLN from BatchNorm/RMSNorm and is the core mechanism by which time-step / class embeddings condition diffusion transformers.

  • Inputs: X (B, N, D), scale (B, D), shift (B, D)
  • Output: (B, N, D), fixed eps = 1e-5, float32 throughout
  • Performance test: B = 16, N = 4,096, D = 1,152 (DiT-XL/2 inspired, ≈ 600 MB working set)

Validated end-to-end against the live platform (T4) with a hand-written CUDA reference solution — all functional + performance tests pass. Reference impl matches torch.nn.functional.layer_norm to within float32 precision on the example.

Test plan

  • pre-commit run --all-files passes
  • python scripts/run_challenge.py challenges/medium/101_adaptive_layer_normalization --language cuda --action run → example test passes
  • python scripts/run_challenge.py challenges/medium/101_adaptive_layer_normalization --language cuda --action submit → all functional + performance tests pass on T4
  • Reference impl produces values matching F.layer_norm + per-batch modulation
  • All 6 starter files present, each with a single parameter description comment and an empty solve
  • HTML example values match generate_example_test() output
  • Performance test working set (~600 MB) fits 5× in 16 GB Tesla T4 VRAM
  • Number 101 is unclaimed by any open PR (highest existing/PR number is 100)

🤖 Generated with Claude Code

Implements the AdaLN modulation kernel that conditions diffusion
transformers (DiT, Stable Diffusion 3, FLUX, OpenSora, SiT). Each token
is layer-normalized along the feature dimension and then modulated by a
per-batch scale and shift vector that is broadcast across the sequence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants