Skip to content
View sBobHuang's full-sized avatar

Block or report sBobHuang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sbobhuang/README.md

πŸ‘¨β€πŸ’» About Me

I'm an AI Kernel Engineer focused on bridging DSLs and hardware. I work on Triton, MLIR, LLVM, compiler IR transformations, GPU kernel optimization, and Agent-driven end-to-end workload acceleration.

🧩 Open Source Contributions

I previously maintained the following open-source projects around OpenAI/Triton.

πŸš€ TritonLLM

LLM Inference via Triton (Flexible & Modular): Focused on Kernel

πŸ”§ Triton Runner

Triton multi-level runner, include cubin, ptx, ttgir etc.

πŸ’‘ Triton OpenCL

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

Getting Started with Triton: A Tutorial for Python Beginners

I also keep learning-oriented open-source notes and examples.

⚑ cuTile Learn

NVIDIA cuTile learning notes and examples

πŸ”₯ LeetGPU

Personal solutions to LeetGPU problems, primarily written in Triton, with selected CuTeDSL, CUDA, and Mojo implementations. The solutions are organized by problem, and my LeetGPU nickname is BobHuang.

πŸ”™ Compiler Engineering Background

Previously, I worked on:

Organizations I Established

I created and maintain the following organizations:

Pinned Loading

  1. toyaix/tritonllm toyaix/tritonllm Public

    LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

    Python 115 5

  2. toyaix/triton-runner toyaix/triton-runner Public

    Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.

    Python 96 5

  3. dsl-learn/LeetGPU dsl-learn/LeetGPU Public

    LeetGPU Solutions

    Python 116 5

  4. dsl-learn/cutile-learn dsl-learn/cutile-learn Public

    NVIDIA cuTile learn

    Python 168 2

  5. dsl-learn/kernel-to-sol dsl-learn/kernel-to-sol Public

    SOL-ExecBench Solutions

    Python 8

  6. dsl-learn/cuda-magic dsl-learn/cuda-magic Public

    fake CUTLASS to get peformance

    Python 23