ColdPorridge

ColdPorridge's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.6k 1.8k 47.9k23k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.9k 252 1412.8k
ggerganov/ggml
Tensor library for machine learning
Language:C++11.5k 134 4311.1k
WooooDyy/LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
7k 138 14418
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Language:C++6.5k 154 1k728
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python4.4k 26 581468
TheNetAdmin/zjuthesis
Zhejiang University Graduation Thesis LaTeX Template
Language:TeX2.8k 15 322641
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.3k 33 254149
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Language:Cuda1.9k 14 9195
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Language:C++1.7k 30 664234
NUS-HPC-AI-Lab/OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
Language:Python1.4k 23 6093
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
1.3k 42 686
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Language:Python1.2k 19 65151
emmericp/ixy
A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch
Language:C1.2k 45 15126
kakaobrain/torchgpipe
A GPipe implementation in PyTorch
Language:Python819 33 33100
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
717 29 126
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python688 33 1836
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Language:Python502 11 826
Eddie-Wang1120/HPC-Learning-Notes
高性能计算相关知识学习笔记，包含学习笔记和相关知识的代码demo，在持续完善中。如果有帮助的话请Star一下，对作者帮助很大，谢谢！
Language:Jupyter Notebook392 6 135
intelligent-machine-learning/glake
GLake: optimizing GPU memory management and IO transmission.
Language:Python381 7 2233
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Language:Python359 2 1444
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
Language:Python317 18 5845
alibaba-edu/High-Precision-Congestion-Control
Language:Python303 10 51156
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
Language:Python303 7 2916
ljgibbslf/Chinese-Translation-of-PCI-Express-Technology-
Chinese Translation on <PCI Express Technology Comprehensive Guide to Generations 1.x, 2.x and 3.0> by Mindshare Mindshare
277 8 494
S-Lab-System-Group/Awesome-DL-Scheduling-Papers
267 12 833
eniac/paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
Language:C++58 4 16
lastweek/lastweek.github.io
Yizhou' Homepage
Language:HTML46 3 05
firechecking/CleanParallel
an implementation of parallel skills like amp, ddp, pp, tp for learning purposes
Language:Python12 1 00
microsoft/inspector-topo
An interconnect topology detection tool for Azure VMs
Language:C++8 5 11