Sakits's Stars
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
zilliztech/GPTCache
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
OpenNMT/CTranslate2
Fast inference engine for Transformer models
li-plus/chatglm.cpp
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
AkariAsai/self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
openppl-public/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
jquesnelle/yarn
YaRN: Efficient Context Window Extension of Large Language Models
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
run-llama/chat-llamaindex
facebookresearch/contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
ict-bigdatalab/awesome-pretrained-models-for-information-retrieval
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
jzbjyb/FLARE
Forward-Looking Active REtrieval-augmented generation (FLARE)
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
epfml/landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
LLaVA-VL/LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
urvashik/knnlm
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
AI21Labs/in-context-ralm
amirgholami/ai_and_memory_wall
AI and Memory Wall
lm-sys/llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
yxli2123/LoftQ
IST-DASLab/QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
bigai-nlco/LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
mit-han-lab/spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
zhangsichengsjtu/AFPQ
AFPQ code implementation