zyang37's Stars
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—foundation models
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
LukeMathWalker/zero-to-production
Code for "Zero To Production In Rust", a book on API development using Rust.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
cuda-mode/lectures
Material for cuda-mode lectures
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
harvard-edge/cs249r_book
Collaborative book Machine Learning Systems
UMass-Foundation-Model/3D-LLM
Code for 3D-LLM: Injecting the 3D World into Large Language Models
ysymyth/awesome-language-agents
List of language agents based on paper "Cognitive Architectures for Language Agents"
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
deeperlearning/professional-cuda-c-programming
Xiuyu-Li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
quantumlib/Qualtran
Qᴜᴀʟᴛʀᴀɴ is a Python library for expressing and analyzing Fault Tolerant Quantum algorithms.
NVIDIA/mig-parted
MIG Partition Editor for NVIDIA GPUs
snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
xdit-project/DistVAE
A parallelism VAE avoids OOM for high resolution image generation
MDK8888/vllmini
A minimal implementation of vllm.
amazon-science/piperag
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
MaoZiming/papers
Paper-reading notes for Berkeley OS prelim exam.