zirui-ray-liu's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
mlc-ai/web-llm
High-performance In-browser LLM Inference Engine
exo-explore/exo
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
rapidsai/cugraph
cuGraph - RAPIDS Graph Analytics Library
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
qhjqhj00/MemoRAG
Empowering RAG with a memory-based data interface for all-purpose applications!
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
salesforce/progen
Official release of the ProGen models
guanchuwang/redis-bench
NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
mirage-project/mirage
A multi-level tensor algebra superoptimizer
yuzhimanhua/Awesome-Scientific-Language-Models
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)
ChenLiu-1996/CitationMap
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
awslabs/graphstorm
Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.
FreedomIntelligence/HuatuoGPT-II
HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)
Xiuyu-Li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
HanGuo97/flute
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
facebookresearch/RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
WukLab/LITE
LITE Kernel RDMA Support for Datacenter Applications. SOSP 2017.
GATECH-EIC/ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
JeanKaddour/NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
ShadenSmith/splatt
The Surprisingly ParalleL spArse Tensor Toolkit.
henryzhongsc/longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024
YaoJiayi/CacheBlend
Leooyii/LCEG
Long Context Extension and Generalization in LLMs
nuclear-multimessenger-astronomy/nmma
A pythonic library for probing nuclear physics and cosmology with multimessenger analysis
zirui-ray-liu/Exact
FreedomIntelligence/DotaGPT
Chinese Medical instruction-tuning Dataset