I'm an AI Kernel Engineer focused on bridging DSLs and hardware. I work on Triton, MLIR, LLVM, compiler IR transformations, GPU kernel optimization, and Agent-driven end-to-end workload acceleration.
I previously maintained the following open-source projects around OpenAI/Triton.
π TritonLLM
LLM Inference via Triton (Flexible & Modular): Focused on Kernel
π§ Triton Runner
Triton multi-level runner, include cubin, ptx, ttgir etc.
π‘ Triton OpenCL
Triton for OpenCL backend, and use mlir-translate to get source OpenCL code
π Triton Tutorial
Getting Started with Triton: A Tutorial for Python Beginners
I also keep learning-oriented open-source notes and examples.
β‘ cuTile Learn
NVIDIA cuTile learning notes and examples
π₯ LeetGPU
Personal solutions to LeetGPU problems, primarily written in Triton, with selected CuTeDSL, CUDA, and Mojo implementations. The solutions are organized by problem, and my LeetGPU nickname is BobHuang.
Previously, I worked on:
- π Triton new NPU backend https://github.com/triton-lang/triton
- π₯ Triton TLX-style new NPU backend https://github.com/facebookexperimental/triton
- π§ PyTorch new backend https://github.com/pytorch/pytorch
- π₯οΈ MLIR https://github.com/llvm/llvm-project
- π οΈ LLVM RISC-V backend https://github.com/llvm/llvm-project
- π¦ libclc(library of OpenCL) https://github.com/llvm/llvm-project
- β‘ POCL(runtime of OpenCL) https://github.com/pocl/pocl
- π§© QEMU(emulator) https://github.com/qemu/qemu
- π§βπ» MLSynthesis(FPGA HLS TOOL) https://github.com/pku-liang/hector
- π§ͺ MLSynthesis Debuger(FPGA HLS TOOL) https://github.com/pku-liang/Hestia
- βοΈ ONNX-MLIR (Lowering of ONNX Models in MLIR) https://github.com/onnx/onnx-mlir
- π§° Polygeist(C/C++ frontend for MLIR) https://github.com/llvm/Polygeist
I created and maintain the following organizations:




