lauthu

快手Beijing, China.

lauthu's Stars

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27.5k 225 4.6k4.1k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python19.5k 160 1.5k2.1k
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Language:Python19.1k 278 2.9k2.6k
NVIDIA/nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
17.2k 449 1.6k2k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python10.4k 68 105667
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Language:Python10.4k 104 1461.1k
karpathy/micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Language:Jupyter Notebook10.1k 149 301.4k
mistralai/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
Language:Jupyter Notebook8.8k 116 115761
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.3k 88 1.8k925
Dhghomon/easy_rust
Rust explained using easy English
Language:Shell8.1k 149 44378
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++7.9k 77 161406
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Language:Python6.6k 71 1.7k479
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
Language:Python4.4k 47 191445
IntelLabs/distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Language:Jupyter Notebook4.3k 132 350799
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Language:Python3.5k 33 436271
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python2.3k 23 179191
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.2k 33 87152
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.2k 34 200251
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python1.9k 29 48151
microsoft/Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
Language:Python1.5k 30 181162
ionelmc/pytest-benchmark
py.test fixture for benchmarking code
Language:Python1.2k 20 188119
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.2k 21 87137
microsoft/onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
Language:C++1.2k 38 156331
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python662 23 46896
Sunt-ing/database-system-readings
:yum: A curated reading list about database systems
465 8 12631
facebookresearch/LLM-QAT
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
Language:Python240 5 3023
transformer-vq/transformer_vq
Language:Python172 3 212
openppl-public/ppl.nn.llm
141 4 619
ROCm/flash-attention
Fast and memory-efficient exact attention
Language:Python129 10 3641
Syencil/Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
Language:Cuda58 2 09

lauthu

lauthu's Stars

vllm-project/vllm

haotian-liu/LLaVA

huggingface/datasets

NVIDIA/nvidia-docker

microsoft/LoRA

magic-research/magic-animate

karpathy/micrograd

mistralai/mistral-src

NVIDIA/TensorRT-LLM

Dhghomon/easy_rust

SJTU-IPADS/PowerInfer

skypilot-org/skypilot

allenai/OLMo

IntelLabs/distiller

turboderp/exllamav2

ModelTC/lightllm

FasterDecoding/Medusa

intel/neural-compressor

IST-DASLab/gptq

microsoft/Olive

ionelmc/pytest-benchmark

mit-han-lab/smoothquant

microsoft/onnxruntime-inference-examples

triton-inference-server/tensorrtllm_backend

Sunt-ing/database-system-readings

facebookresearch/LLM-QAT

transformer-vq/transformer_vq

openppl-public/ppl.nn.llm

ROCm/flash-attention

Syencil/Programming_Massively_Parallel_Processors