feat(vllm): add Strix Halo vLLM image with gfx1150/1151 CI by KerwinTsaiii · Pull Request #101 · AMDResearch/aup-learning-cloud

KerwinTsaiii · 2026-05-12T14:29:35Z

Build vLLM + AITER + ROCm flash-attention from source on top of auplc-base, targeted at gfx1151 (Strix Halo) and gfx1150.

dockerfiles/VLLM/: Dockerfile and helper scripts (build / server / bench / chat) plus the Welcome-vLLM-on-Strix-Halo notebook.
patch_aiter_headers.py + optCompilerConfig.gfx1151.json: source-level RDNA 3.5 fallbacks for AITER's CDNA-only ISA paths (vec_convert.h packed fp8/bf8, hip_reduce.h DPP row_bcast -> ds_swizzle).
patch_flash_attn_setup.py + patch_strix.py: build-system fixups for the AMD flash-attention fork on gfx1151.
dockerfiles/Makefile: 'make vllm' target wiring GPU_TARGET / VLLM_REF / MAX_JOBS / FLASH_ATTN_REF through to VLLM/build.sh.
.github/workflows/docker-build-vllm.yml: matrix CI for gfx1150 + gfx1151 publishing ghcr.io/amdresearch/auplc-vllm:{tag}-gfx115x (and unsuffixed aliases for the default gfx1151 target). Runs sequentially (max-parallel: 1) to fit ubuntu-latest's 7 GB / 14 GB envelope; cache scoped per GPU.

Summary

Changes

Testing

Files Changed

Checklist

Code follows project style guidelines
Changes are backward compatible
Tested on local Kubernetes cluster
Documentation links updated

Build vLLM + AITER + ROCm flash-attention from source on top of auplc-base, targeted at gfx1151 (Strix Halo) and gfx1150. * dockerfiles/VLLM/: Dockerfile and helper scripts (build / server / bench / chat) plus the Welcome-vLLM-on-Strix-Halo notebook. * patch_aiter_headers.py + optCompilerConfig.gfx1151.json: source-level RDNA 3.5 fallbacks for AITER's CDNA-only ISA paths (vec_convert.h packed fp8/bf8, hip_reduce.h DPP row_bcast -> ds_swizzle). * patch_flash_attn_setup.py + patch_strix.py: build-system fixups for the AMD flash-attention fork on gfx1151. * dockerfiles/Makefile: 'make vllm' target wiring GPU_TARGET / VLLM_REF / MAX_JOBS / FLASH_ATTN_REF through to VLLM/build.sh. * .github/workflows/docker-build-vllm.yml: matrix CI for gfx1150 + gfx1151 publishing ghcr.io/amdresearch/auplc-vllm:{tag}-gfx115x (and unsuffixed aliases for the default gfx1151 target). Runs sequentially (max-parallel: 1) to fit ubuntu-latest's 7 GB / 14 GB envelope; cache scoped per GPU. Co-authored-by: Cursor <cursoragent@cursor.com>

GHA evaluates `matrix:` before other contexts, so filtering matrix entries from a job-level `if: ... matrix.gpu_target` raised "Unrecognized named-value: 'matrix'" and the workflow shipped as invalid (Actions UI fell back to the filename). Replace the post-hoc filter with a tiny `resolve-matrix` job that emits a JSON array based on workflow_dispatch input, and feed it back to `build-vllm` via `fromJSON(needs.resolve-matrix.outputs.gpu_targets)`. Push / pull_request keep building both gfx1150 + gfx1151; manual runs with `gpu_target=gfx1151` (or 1150) build only that one. Co-authored-by: Cursor <cursoragent@cursor.com>

Round out the vLLM image with the surrounding project glue that lets users actually launch and measure it. * runtime/values.yaml: register auplc-vllm as a hub-spawnable profile (vllm image + GPU resources + "vLLM Inference Server" card) and add it to the official / native-users / github-users access lists so it shows up next to the Course images in JupyterHub. * pyproject.toml: exclude dockerfiles/VLLM/patch_aiter_headers.py from ruff. The file is ~95 % C++ source held inside a Python string — ruff would chase indentation / trailing-space inside the embedded C++ forever; the wrapper Python is trivial enough to skip linting. * benchmarks/run_qwen3_4b_throughput.sh: host-side wrapper that boots the auplc-vllm container, waits for /v1/models to settle, then docker-execs the in-image bench against loopback so client and server share the exact same vLLM build. * benchmarks/.gitignore: keep run logs (server / bench JSON + .log) out of VCS; results live outside the repo. Co-authored-by: Cursor <cursoragent@cursor.com>

* cell 3 (sanity-check): hoist `import torch` above the `print(...)` block so all imports sit at the top, then let ruff's isort group it under the third-party block. (E402 + I001) * cell 14 (cleanup): drop the duplicate `import os` / `import signal`; cell 6 already pulled them into the notebook's global namespace, and the cleanup cell can't run standalone anyway (it depends on `server` from cell 6). (F811) Co-authored-by: Cursor <cursoragent@cursor.com>

Mostly mechanical: collapse a one-line raise, expand the long chat-completion user message into a multi-line dict, and a few trivial whitespace touches across cells. No semantic changes. Co-authored-by: Cursor <cursoragent@cursor.com>

KerwinTsaiii and others added 6 commits May 12, 2026 22:29

Remove benchmarks folder

35428c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm): add Strix Halo vLLM image with gfx1150/1151 CI#101

feat(vllm): add Strix Halo vLLM image with gfx1150/1151 CI#101
KerwinTsaiii wants to merge 6 commits into
developfrom
feat/add-vllm-base-image

KerwinTsaiii commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KerwinTsaiii commented May 12, 2026

Summary

Changes

Testing

Files Changed

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant