Sakits

MLSys & Algo; Ph.D. student @ MIT.

Shanghai Jiao Tong UniversityCambridge, MA

Sakits's Stars

01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
Language:Python7.4k 112 287454
zilliztech/GPTCache
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Language:Python6.9k 59 156480
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
Language:Python6k 66 151436
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Language:Python4.2k 61 91215
OpenNMT/CTranslate2
Fast inference engine for Transformer models
Language:C++3k 56 657269
li-plus/chatglm.cpp
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4
Language:C++2.8k 41 241327
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.8k 41 283163
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Language:Python1.6k 24 3782
AkariAsai/self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
Language:Python1.6k 16 75140
openppl-public/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Language:Python1.4k 17 218223
jquesnelle/yarn
YaRN: Efficient Context Window Extension of Large Language Models
Language:Python1.3k 14 55112
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda764 13 7064
run-llama/chat-llamaindex
Language:TypeScript764 24 34246
facebookresearch/contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
Language:Python630 16 1759
ict-bigdatalab/awesome-pretrained-models-for-information-retrieval
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
616 21 246
jzbjyb/FLARE
Forward-Looking Active REtrieval-augmented generation (FLARE)
Language:Python545 7 1950
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Language:Python491 24 6939
epfml/landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
Language:Python399 40 1535
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
353 10 138
LLaVA-VL/LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
Language:Python332 15 825
urvashik/knnlm
Language:Python309 8 845
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python307 8 936
AI21Labs/in-context-ralm
Language:Python244 6 1125
amirgholami/ai_and_memory_wall
AI and Memory Wall
197 2 424
lm-sys/llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
Language:Python178 3 512
yxli2123/LoftQ
Language:Python178 4 3216
IST-DASLab/QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
Language:C++161 6 612
bigai-nlco/LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
Language:Python123 3 55
mit-han-lab/spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Language:Scala55 8 13
zhangsichengsjtu/AFPQ
AFPQ code implementation
Language:Python15 0 03

Sakits

Sakits's Stars

01-ai/Yi

zilliztech/GPTCache

deepseek-ai/DeepSeek-Coder

luosiallen/latent-consistency-model

OpenNMT/CTranslate2

li-plus/chatglm.cpp

microsoft/DeepSpeed-MII

S-LoRA/S-LoRA

AkariAsai/self-rag

openppl-public/ppq

jquesnelle/yarn

flashinfer-ai/flashinfer

run-llama/chat-llamaindex

facebookresearch/contriever

ict-bigdatalab/awesome-pretrained-models-for-information-retrieval

jzbjyb/FLARE

princeton-nlp/LLM-Shearing

epfml/landmark-attention

Zhen-Dong/Awesome-Quantization-Papers

LLaVA-VL/LLaVA-Interactive-Demo

urvashik/knnlm

cli99/llm-analysis

AI21Labs/in-context-ralm

amirgholami/ai_and_memory_wall

lm-sys/llm-decontaminator

yxli2123/LoftQ

IST-DASLab/QUIK

bigai-nlco/LooGLE

mit-han-lab/spatten-llm

zhangsichengsjtu/AFPQ