LiuXiaoxuanPKU's Stars
ggerganov/llama.cpp
LLM inference in C/C++
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
karpathy/llm.c
LLM training in simple, raw C/CUDA
plasma-umass/scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
huggingface/blog
Public repo for HF blog posts
FranxYao/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
huachaohuang/awesome-dbdev
Awesome materials about database development.
the-full-stack/website
Source for https://fullstackdeeplearning.com
kakaobrain/torchgpipe
A GPipe implementation in PyTorch
THUDM/LongBench
LongBench v2 and LongBench (ACL 2024)
mirage-project/mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
hemingkx/SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
apoorvumang/prompt-lookup-decoding
rmihaylov/falcontune
Tune any FALCON in 4-bit
bojone/NBCE
Naive Bayes-based Context Extension
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
amesar/mlflow-examples
Basic and advanced MLflow examples for many ML flavors
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
r2e-project/r2e
r2e: turn any github repository into a programming agent environment
EmbeddedLLM/vllm-rocm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
tyler-griggs/melange-release
vllm-project/dashboard
vLLM performance dashboard
flashinfer-ai/debug-print
Debug print operator for cudagraph debugging