GindaChen's Stars
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
mem0ai/mem0
The Memory layer for your AI apps
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
meta-llama/llama-stack
Composable building blocks to build Llama Apps
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
lululxvi/deepxde
A library for scientific machine learning and physics-informed learning
yhzhang0128/egos-2000
Envision a future where every student can read all the code of a teaching operating system.
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
parrt/tensor-sensor
The goal of this library is to generate more helpful exception messages for matrix algebra expressions for numpy, pytorch, jax, tensorflow, keras, fastai.
lucidrains/ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
FloridSleeves/LLMDebugger
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
nyu-systems/Grendel-GS
Ongoing research training gaussian splatting at scale by distributed system
spcl/QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
jy-yuan/KIVI
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
Saibo-creator/Awesome-LLM-Constrained-Decoding
A curated list of papers related to constrained decoding of LLM, along with their relevant code and resources.
QingruZhang/PASTA
PASTA: Post-hoc Attention Steering for LLMs
snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Mutinifni/splitwise-sim
LLM serving cluster simulator
microsoft/llguidance
Low-level Guidance Parser
barabanshek/sabre
yunjiazhang/ReAcTable
The code base for paper: "ReAcTable: Enhancing ReAct for Table Question Answering"
dpaleka/stealing-part-lm-supplementary
Some code for "Stealing Part of a Production Language Model"
TuftsNATLab/PCS
sramshetty/stealing-part-of-an-LM
An unofficial implementation of "Stealing Part of a Production Language Model"