LinkZyy's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
d2l-ai/d2l-zh
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
xai-org/grok-1
Grok open release
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
joonspk-research/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
triton-lang/triton
Development repository for the Triton language and compiler
academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
cpacker/MemGPT
Letta (fka MemGPT) is a framework for creating stateful LLM services.
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
libsdl-org/SDL
Simple Directmedia Layer
cumulo-autumn/StreamDiffusion
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
punica-ai/punica
Serving multiple LoRA finetuned LLM as one
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
AmadeusChan/Awesome-LLM-System-Papers
AI21Labs/in-context-ralm
microsoft/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
geohot/cuda_ioctl_sniffer
Sniff CUDA ioctls
ACL2023-Retrieval-LM/ACL2023-Retrieval-LM.github.io
https://acl2023-retrieval-lm.github.io/
nightdessert/Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
princeton-nlp/MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
UofT-EcoSystem/DietCode
DietCode Code Release
summerspringwei/souffle-ae