BobHuang sBobHuang

👨‍💻 About Me

I'm an AI Kernel Engineer focused on bridging DSLs and hardware. I work on Triton, MLIR, LLVM, compiler IR transformations, GPU kernel optimization, and Agent-driven end-to-end workload acceleration.

🧩 Open Source Contributions

I previously maintained the following open-source projects around OpenAI/Triton.

🚀 TritonLLM

LLM Inference via Triton (Flexible & Modular): Focused on Kernel

🔧 Triton Runner

Triton multi-level runner, include cubin, ptx, ttgir etc.

💡 Triton OpenCL

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

📖 Triton Tutorial

Getting Started with Triton: A Tutorial for Python Beginners

I also keep learning-oriented open-source notes and examples.

⚡ cuTile Learn

NVIDIA cuTile learning notes and examples

🔥 LeetGPU

Personal solutions to LeetGPU problems, primarily written in Triton, with selected CuTeDSL, CUDA, and Mojo implementations. The solutions are organized by problem, and my LeetGPU nickname is BobHuang.

🔙 Compiler Engineering Background

Previously, I worked on:

🚀 Triton new NPU backend https://github.com/triton-lang/triton
🔥 Triton TLX-style new NPU backend https://github.com/facebookexperimental/triton
🧠 PyTorch new backend https://github.com/pytorch/pytorch
🖥️ MLIR https://github.com/llvm/llvm-project
🛠️ LLVM RISC-V backend https://github.com/llvm/llvm-project
📦 libclc(library of OpenCL) https://github.com/llvm/llvm-project
⚡ POCL(runtime of OpenCL) https://github.com/pocl/pocl
🧩 QEMU(emulator) https://github.com/qemu/qemu
🧑‍💻 MLSynthesis(FPGA HLS TOOL) https://github.com/pku-liang/hector
🧪 MLSynthesis Debuger(FPGA HLS TOOL) https://github.com/pku-liang/Hestia
⚙️ ONNX-MLIR (Lowering of ONNX Models in MLIR) https://github.com/onnx/onnx-mlir
🧰 Polygeist(C/C++ frontend for MLIR) https://github.com/llvm/Polygeist

Organizations I Established

I created and maintain the following organizations:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly