xiaoguoer

a stupid boy

NJUNanjing Jiangsu

xiaoguoer's Stars

ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++17.2k 148 554999
abetlen/llama-cpp-python
Python bindings for llama.cpp
Language:Python8.1k 74 1.1k966
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++8k 78 168412
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6.1k 74 5371k
yangjianxin1/Firefly
Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Language:Python5.9k 57 280525
TimDettmers/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python5.8k 48 968584
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.7k 61 104514
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.5k 31 461484
AI4Finance-Foundation/ElegantRL
Massively Parallel Deep Reinforcement Learning. 🔥
Language:Python3.7k 50 261850
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Language:Python3.7k 34 457281
turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Language:Python2.8k 37 219220
huggingface/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Language:Python2.6k 58 754469
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.3k 31 90158
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python1.9k 29 49154
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.9k 41 307175
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.8k 15 405211
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language:Python1.6k 37 550247
Tencent/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Language:C++1.5k 41 119198
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Language:Python1.3k 42 391
kuleshov-group/llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
Language:Python707 13 2276
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Language:Python566 2 1757
bytedance/effective_transformer
Running BERT without Padding
Language:C++460 8 452
intel/xFasterTransformer
Language:C++379 14 8765
neuralmagic/sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Language:Python370 25 2325
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Language:Python352 8 1042
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++348 8 4737
huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Language:Python251 5 8348
inferflow/inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Language:C++236 8 1624
wejoncy/QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Language:Python148 9 1915
FreedomIntelligence/FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
Language:Python34 2 03

xiaoguoer

xiaoguoer's Stars

ml-explore/mlx

abetlen/llama-cpp-python

SJTU-IPADS/PowerInfer

microsoft/DeepSpeedExamples

yangjianxin1/Firefly

TimDettmers/bitsandbytes

pytorch-labs/gpt-fast

AutoGPTQ/AutoGPTQ

AI4Finance-Foundation/ElegantRL

turboderp/exllamav2

turboderp/exllama

huggingface/optimum

FasterDecoding/Medusa

IST-DASLab/gptq

microsoft/DeepSpeed-MII

casper-hansen/AutoAWQ

intel/intel-extension-for-pytorch

Tencent/TurboTransformers

horseee/Awesome-Efficient-LLM

kuleshov-group/llmtools

feifeibear/LLMSpeculativeSampling

bytedance/effective_transformer

intel/xFasterTransformer

neuralmagic/sparsezoo

cli99/llm-analysis

intel/neural-speed

huggingface/optimum-benchmark

inferflow/inferflow

wejoncy/QLLM

FreedomIntelligence/FastLLM