zhurou603

LLM

zhurou603's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python83.9k 1.7k 46.5k22.6k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python12.1k 206 2.3k2.5k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10.5k 161 7752.4k
AccumulateMore/CV
✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】
Language:Jupyter Notebook6.2k 17 18809
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.7k 61 104514
daquexian/onnx-simplifier
Simplify your onnx model
Language:C++3.9k 51 307383
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.8k 90 6192
AlexanderZhou01/China-software-copyright
Chinese software copyright application template document
2.1k 10 9359
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python2k 35 349326
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python1.9k 24 182344
intelligent-machine-learning/dlrover
DLRover: An Automatic Distributed Deep Learning System
Language:Python1.3k 50 242166
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.3k 31 1.4k162
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
Language:Python1.2k 41 76122
ECNU-ICALK/EduChat
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型，GPU部署，数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Language:Jupyter Notebook706 16 2376
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python662 34 1734
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda623 4 654
DefTruth/CUDA-Learn-Note
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Language:Cuda435 6 052
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Language:Python353 4 1824
BBuf/how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
348 7 122
zhangyachen/ComputerArchitectureAndCppBooks
📚 计算机体系结构与C++书籍收集(持续更新)
336 6 095
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Language:Python311 2 1037
nihui/ruapu
Detect CPU features with single-file
Language:C296 6 3037
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
Language:Cuda287 7 454
RussWong/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
Language:Cuda195 2 752
CalvinXKY/BasicCUDA
A tutorial for CUDA&PyTorch
Language:C++117 1 124
njuhope/cuda_sgemm
Language:Cuda103 1 329
feifeibear/LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
Language:Jupyter Notebook74 1 14
elithnever/distributedtechshare
分布式技术追踪
61 15 14
hzwer/brief_paper_reading
My paper reading and insights record
Language:Python22 6 00
BBuf/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda2 1 0

zhurou603

zhurou603's Stars

pytorch/pytorch

NVIDIA/NeMo

NVIDIA/Megatron-LM

AccumulateMore/CV

pytorch-labs/gpt-fast

daquexian/onnx-simplifier

DefTruth/Awesome-LLM-Inference

AlexanderZhou01/China-software-copyright

NVIDIA/TransformerEngine

microsoft/Megatron-DeepSpeed

intelligent-machine-learning/dlrover

NVIDIA/cccl

huggingface/nanotron

ECNU-ICALK/EduChat

volcengine/veScale

tspeterkim/flash-attention-minimal

DefTruth/CUDA-Learn-Note

feifeibear/long-context-attention

BBuf/how-to-learn-deep-learning-framework

zhangyachen/ComputerArchitectureAndCppBooks

hahnyuan/LLM-Viewer

nihui/ruapu

Yinghan-Li/YHs_Sample

RussWong/CUDATutorial

CalvinXKY/BasicCUDA

njuhope/cuda_sgemm

feifeibear/LLMRoofline

elithnever/distributedtechshare

hzwer/brief_paper_reading

BBuf/How_to_optimize_in_GPU