Bruce-Lee-LY

LLM Infer, AI Infra, CUDA

Tsinghua University

Pinned Repositories

cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Language:Cuda11 2 12
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda329 4 1468
cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Language:Cuda54 5 04
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Language:C145 3 1240
cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Language:C++16 1 02
decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Language:C++27 2 01
flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language:C++33 1 43
matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
Language:C++14 3 02
memory_pool
Simple and efficient memory pool is implemented with C++11.
Language:C++6 3 04
thread_pool
Thread pool is implemented to process task queue with C++11.
Language:C++3 3 01

Bruce-Lee-LY's Repositories

Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda329 4 1468
Bruce-Lee-LY/cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Language:C145 3 1240
Bruce-Lee-LY/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Language:Cuda54 5 04
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language:C++33 1 43
Bruce-Lee-LY/decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Language:C++27 2 01
Bruce-Lee-LY/cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Language:C++16 1 02
Bruce-Lee-LY/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
Language:C++14 3 02
Bruce-Lee-LY/cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Language:Cuda11 2 12
Bruce-Lee-LY/memory_pool
Simple and efficient memory pool is implemented with C++11.
Language:C++6 3 04
Bruce-Lee-LY/thread_pool
Thread pool is implemented to process task queue with C++11.
Language:C++3 3 01
Bruce-Lee-LY/deep_learning
Implemented the training and inference of several common deep learning model algorithms with tensorflow and pytorch.
Language:Python1 3 00
Bruce-Lee-LY/algorithm_design
Use several algorithm design methods to solve several common problems with C++11.
Language:C++0 3 01
Bruce-Lee-LY/crawler
Several fun crawler cases implemented in Python.
Language:Python0 3 00
Bruce-Lee-LY/data_structure
Several commonly used data structures are implemented with C++11.
Language:C++0 3 00
Bruce-Lee-LY/machine_learning
Implement several common machine learning algorithms with sklearn.
Language:Python0 3 00

Bruce-Lee-LY

Pinned Repositories

cuda_back2back_hgemm

cuda_hgemm

cuda_hgemv

cuda_hook

cutlass_gemm

decoding_attention

flash_attention_inference

matrix_multiply

memory_pool

thread_pool

Bruce-Lee-LY's Repositories

Bruce-Lee-LY/cuda_hgemm

Bruce-Lee-LY/cuda_hook

Bruce-Lee-LY/cuda_hgemv

Bruce-Lee-LY/flash_attention_inference

Bruce-Lee-LY/decoding_attention

Bruce-Lee-LY/cutlass_gemm

Bruce-Lee-LY/matrix_multiply

Bruce-Lee-LY/cuda_back2back_hgemm

Bruce-Lee-LY/memory_pool

Bruce-Lee-LY/thread_pool

Bruce-Lee-LY/deep_learning

Bruce-Lee-LY/algorithm_design

Bruce-Lee-LY/crawler

Bruce-Lee-LY/data_structure

Bruce-Lee-LY/machine_learning