Skip to content

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML#789

Open
polarG wants to merge 2 commits intomainfrom
dev/hongtaozhang/exclude-gpu-stream-from-rocm
Open

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML#789
polarG wants to merge 2 commits intomainfrom
dev/hongtaozhang/exclude-gpu-stream-from-rocm

Conversation

@polarG
Copy link
Copy Markdown
Contributor

@polarG polarG commented Mar 14, 2026

Summary

The gpu_stream benchmark has NVIDIA-specific dependencies that prevent it from compiling on ROCm 6.3+. This change makes it CUDA-only, gracefully skipping the build with a warning on non-NVIDIA
environments.

Problem

The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple NVIDIA-specific dependencies:

  1. nvml.h — NVIDIA Management Library header, used for querying actual memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and gpu_stream_utils.hpp.
  2. cuda.h in headers — Three .hpp files (gpu_stream.hpp, gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h> and <cuda_runtime.h>. These headers are not processed by hipify-perl (only
    .cu source files are), so they fail to resolve on ROCm.
  3. Deprecated hipDeviceProp_t struct fields — The code accesses memoryBusWidth, memoryClockRate, and ECCEnabled from the device properties struct. These fields were removed from hipDeviceProp_t in ROCm
    6.3, causing compilation errors after hipification.

The existing ROCm path was marked as incomplete (# TODO: test for ROC) and was never fully functional on recent ROCm versions.

Changes

  • Removed the non-functional ROCm/HIP build path from gpu_stream/CMakeLists.txt
  • When CUDA is not found, prints a warning and returns gracefully instead of attempting a broken hipify build or raising FATAL_ERROR
  • No changes to the NVIDIA/CUDA build path — it continues to work as before

Impact

  • NVIDIA builds: No change — gpu_stream builds and installs normally
  • ROCm builds: gpu_stream is skipped with a warning message. Previously it would fail the entire make cppbuild step, blocking the Docker image build
  • Other benchmarks: Unaffected — build.sh continues to the next benchmark after gpu_stream returns

gpu_stream depends on NVML (nvidia-ml) for GPU monitoring which is
NVIDIA-specific and has no ROCm equivalent. Remove the ROCm/HIP build
path and skip the build cleanly when CUDA is not found.
@polarG polarG requested a review from a team as a code owner March 14, 2026 02:16
Copilot AI review requested due to automatic review settings March 14, 2026 02:16
@polarG polarG added bug Something isn't working CI/CD Continuous integration or deployment labels Mar 14, 2026
@polarG polarG self-assigned this Mar 14, 2026
@polarG polarG requested review from WenqingLan1 and guoshzhao March 14, 2026 02:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the broken ROCm/HIP build path for the gpu_stream benchmark, making it CUDA-only with a graceful warning when CUDA is not found.

Changes:

  • Replaced the non-functional ROCm/HIP build path and FATAL_ERROR fallback with a WARNING message and return()
  • No changes to the CUDA build path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.70%. Comparing base (6b8e810) to head (2bf019a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #789   +/-   ##
=======================================
  Coverage   85.70%   85.70%           
=======================================
  Files         102      102           
  Lines        7703     7703           
=======================================
  Hits         6602     6602           
  Misses       1101     1101           
Flag Coverage Δ
cpu-python3.10-unit-test 70.96% <ø> (ø)
cpu-python3.12-unit-test 70.96% <ø> (ø)
cpu-python3.7-unit-test 70.43% <ø> (ø)
cuda-unit-test 83.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@polarG polarG enabled auto-merge (squash) March 26, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD Continuous integration or deployment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants