minefantast

the Chinese University of Hong Kong, Shenzhen

minefantast's Stars

zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
Language:C++22850
DefTruth/CUDA-Learn-Notes
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Language:Cuda1.5k161
daemyung/metal-by-tutorials-2nd
Metal by Tutorials By the raywenderlich Tutorial Team
Language:Swift367
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
Language:Cuda28954
KhronosGroup/Vulkan-Samples
One stop solution for all Vulkan samples
Language:C++4.3k649
Bruce-Lee-LY/decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Language:C++241
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Language:C++293
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda30267
leimao/TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
Language:C++388
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Language:Cuda14112
openai/openai-gemm
Open single and half precision gemm implementations
Language:C37485
KhronosGroup/glslang
Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
Language:C++3.1k842
Keenuts/vulkan-compute
related to virglrender-vulkan: basic compute test application
Language:C122
vblanco20-1/vulkan-guide
Introductory guide to vulkan.
Language:SCSS948220
KomputeProject/kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Language:C++2k155
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python2k328
Sunt-ing/stick
:innocent: A PyTorch-like deep learning framework. Just for fun.
Language:Python1368
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Language:Python57157
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++59180
NervanaSystems/maxas
Assembler for NVIDIA Maxwell architecture
Language:Sass950162
cloudcores/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
Language:Python40572
minitorch/minitorch
The full minitorch student suite.
Language:Python1.9k406
GetUpEarlier/minit
Language:Python231
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.5k2.8k
Jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
Language:Python3.1k313
mlc-ai/notebooks
Language:Jupyter Notebook18964
mlc-ai/mlc-zh
Language:Python59064
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档
Language:TypeScript957155
alibaba/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Language:C++8.7k1.7k
ggerganov/llama.cpp
LLM inference in C/C++
Language:C++68.1k9.8k

minefantast

minefantast's Stars

zjhellofss/KuiperLLama

DefTruth/CUDA-Learn-Notes

daemyung/metal-by-tutorials-2nd

Yinghan-Li/YHs_Sample

KhronosGroup/Vulkan-Samples

Bruce-Lee-LY/decoding_attention

Bruce-Lee-LY/flash_attention_inference

Bruce-Lee-LY/cuda_hgemm

leimao/TensorRT-Custom-Plugin-Example

leimao/CUDA-GEMM-Optimization

openai/openai-gemm

KhronosGroup/glslang

Keenuts/vulkan-compute

vblanco20-1/vulkan-guide

KomputeProject/kompute

NVIDIA/TransformerEngine

Sunt-ing/stick

feifeibear/LLMSpeculativeSampling

tpoisonooo/how-to-optimize-gemm

NervanaSystems/maxas

cloudcores/CuAssembler

minitorch/minitorch

GetUpEarlier/minit

karpathy/llm.c

Jittor/jittor

mlc-ai/notebooks

mlc-ai/mlc-zh

hyperai/tvm-cn

alibaba/MNN

ggerganov/llama.cpp