tangpanyu's Stars
ggerganov/llama.cpp
LLM inference in C/C++
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
triton-lang/triton
Development repository for the Triton language and compiler
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
davisking/dlib
A toolkit for making real world machine learning and data analysis applications in C++
rui314/chibicc
A small C compiler
DA-southampton/NLP_ability
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
CodingHanYa/workspace
workspace是基于C++11的轻量级异步执行框架,支持:通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
kaitoukito/Computer-Science-Textbooks
Collect some CS textbooks for learning.
mst272/LLM-Dojo
欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
zeux/calm
CUDA/Metal accelerated language model inference
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Cjkkkk/CUDA_gemm
A simple high performance CUDA GEMM implementation.
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
gavinliu6/Makefile-Tutorial-zh-CN
Makefile 教程
parallel101/stl1weekend
Build your own STL in one weekend
TiledTensor/TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
AyakaGEMM/Hands-on-GEMM
reed-lau/cute-gemm
BooHwang/segment_anything_tensorrt
Accelerate segment anything model inference using Tensorrt 8.6.1.6
NVIDIA/online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
ankan-ban/llama_cu_awq
llama INT4 cuda inference with AWQ
ankan-ban/llama2.cu
Inference Llama 2 in one file of pure Cuda
Tongkaio/MoE_inference
CUDA MoE kernels.