feat(vllm): add Strix Halo vLLM image with gfx1150/1151 CI#101
Open
KerwinTsaiii wants to merge 6 commits into
Open
feat(vllm): add Strix Halo vLLM image with gfx1150/1151 CI#101KerwinTsaiii wants to merge 6 commits into
KerwinTsaiii wants to merge 6 commits into
Conversation
Build vLLM + AITER + ROCm flash-attention from source on top of
auplc-base, targeted at gfx1151 (Strix Halo) and gfx1150.
* dockerfiles/VLLM/: Dockerfile and helper scripts (build / server /
bench / chat) plus the Welcome-vLLM-on-Strix-Halo notebook.
* patch_aiter_headers.py + optCompilerConfig.gfx1151.json: source-level
RDNA 3.5 fallbacks for AITER's CDNA-only ISA paths (vec_convert.h
packed fp8/bf8, hip_reduce.h DPP row_bcast -> ds_swizzle).
* patch_flash_attn_setup.py + patch_strix.py: build-system fixups for
the AMD flash-attention fork on gfx1151.
* dockerfiles/Makefile: 'make vllm' target wiring GPU_TARGET /
VLLM_REF / MAX_JOBS / FLASH_ATTN_REF through to VLLM/build.sh.
* .github/workflows/docker-build-vllm.yml: matrix CI for
gfx1150 + gfx1151 publishing
ghcr.io/amdresearch/auplc-vllm:{tag}-gfx115x (and unsuffixed aliases
for the default gfx1151 target). Runs sequentially (max-parallel: 1)
to fit ubuntu-latest's 7 GB / 14 GB envelope; cache scoped per GPU.
Co-authored-by: Cursor <cursoragent@cursor.com>
GHA evaluates `matrix:` before other contexts, so filtering matrix entries from a job-level `if: ... matrix.gpu_target` raised "Unrecognized named-value: 'matrix'" and the workflow shipped as invalid (Actions UI fell back to the filename). Replace the post-hoc filter with a tiny `resolve-matrix` job that emits a JSON array based on workflow_dispatch input, and feed it back to `build-vllm` via `fromJSON(needs.resolve-matrix.outputs.gpu_targets)`. Push / pull_request keep building both gfx1150 + gfx1151; manual runs with `gpu_target=gfx1151` (or 1150) build only that one. Co-authored-by: Cursor <cursoragent@cursor.com>
Round out the vLLM image with the surrounding project glue that lets users actually launch and measure it. * runtime/values.yaml: register auplc-vllm as a hub-spawnable profile (vllm image + GPU resources + "vLLM Inference Server" card) and add it to the official / native-users / github-users access lists so it shows up next to the Course images in JupyterHub. * pyproject.toml: exclude dockerfiles/VLLM/patch_aiter_headers.py from ruff. The file is ~95 % C++ source held inside a Python string — ruff would chase indentation / trailing-space inside the embedded C++ forever; the wrapper Python is trivial enough to skip linting. * benchmarks/run_qwen3_4b_throughput.sh: host-side wrapper that boots the auplc-vllm container, waits for /v1/models to settle, then docker-execs the in-image bench against loopback so client and server share the exact same vLLM build. * benchmarks/.gitignore: keep run logs (server / bench JSON + .log) out of VCS; results live outside the repo. Co-authored-by: Cursor <cursoragent@cursor.com>
* cell 3 (sanity-check): hoist `import torch` above the `print(...)` block so all imports sit at the top, then let ruff's isort group it under the third-party block. (E402 + I001) * cell 14 (cleanup): drop the duplicate `import os` / `import signal`; cell 6 already pulled them into the notebook's global namespace, and the cleanup cell can't run standalone anyway (it depends on `server` from cell 6). (F811) Co-authored-by: Cursor <cursoragent@cursor.com>
Mostly mechanical: collapse a one-line raise, expand the long chat-completion user message into a multi-line dict, and a few trivial whitespace touches across cells. No semantic changes. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Build vLLM + AITER + ROCm flash-attention from source on top of auplc-base, targeted at gfx1151 (Strix Halo) and gfx1150.
Summary
Changes
Testing
Files Changed
Checklist