xingjinglu's Stars
michaelfeil/infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
sgl-project/sgl-learning-materials
Materials for learning SGLang
LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions
Solutions of Reinforcement Learning, An Introduction
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
dottxt-ai/outlines
Structured Text Generation
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
ShishirPatil/gorilla
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
pcg-mlp/KsanaLLM
pku-liang/MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
mcrl/tccl
Thunder Research Group's Collective Communication Library
byungsoo-oh/ml-systems-papers
Curated collection of papers in machine learning systems
Yanz2015/architecture.wechat-tencent
互联网公司架构: 微信技术架构, 腾讯技术架构
rohan-paul/LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
lamini-ai/lamini
The Official Python Client for Lamini's API
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
OpenPPL/ppl.nn
A primitive library for neural network
trailofbits/vast
VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or further program abstraction.
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
BBuf/tvm_mlir_learn
compiler learning resources collect.
Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
Jokeren/triton-samples
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
mlcommons/training
Reference implementations of MLPerf™ training benchmarks
jmellorcrummey/cupti-test
Test overhead of CUPTI PC sampling for CUDA 10
ROCm/triton
Development repository for the Triton language and compiler