Sakits's Stars
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
modularml/mojo
The Mojo Programming Language
joonspk-research/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
ray-project/llm-numbers
Numbers every LLM developer should know
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
microsoft/Llama-2-Onnx
punica-ai/punica
Serving multiple LoRA finetuned LLM as one
zhanglj37/Tutorial-on-PhD-Application
Tutorial on PhD Application
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
mryab/efficient-dl-systems
Efficient Deep Learning Systems course materials (HSE, YSDA)
THUDM/LongBench
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
abacusai/Long-Context
This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.
huggingface/pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
openppl-public/ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
bojone/rerope
Rectified Rotary Position Embeddings
FMInference/DejaVu
mlc-ai/binary-mlc-llm-libs
mit-han-lab/parallel-computing-tutorial
kyegomez/FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
mit-han-lab/tinychat-tutorial
DS3Lab/Decentralized_FM_alpha
mlc-ai/dlight-bench