ColdPorridge's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
karpathy/llm.c
LLM training in simple, raw C/CUDA
ggerganov/ggml
Tensor library for machine learning
WooooDyy/LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
TheNetAdmin/zjuthesis
Zhejiang University Graduation Thesis LaTeX Template
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
NUS-HPC-AI-Lab/OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
emmericp/ixy
A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch
kakaobrain/torchgpipe
A GPipe implementation in PyTorch
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
volcengine/veScale
A PyTorch Native LLM Training Framework
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Eddie-Wang1120/HPC-Learning-Notes
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
intelligent-machine-learning/glake
GLake: optimizing GPU memory management and IO transmission.
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
alibaba-edu/High-Precision-Congestion-Control
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
ljgibbslf/Chinese-Translation-of-PCI-Express-Technology-
Chinese Translation on <PCI Express Technology Comprehensive Guide to Generations 1.x, 2.x and 3.0> by Mindshare Mindshare
S-Lab-System-Group/Awesome-DL-Scheduling-Papers
eniac/paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
lastweek/lastweek.github.io
Yizhou' Homepage
firechecking/CleanParallel
an implementation of parallel skills like amp, ddp, pp, tp for learning purposes
microsoft/inspector-topo
An interconnect topology detection tool for Azure VMs