Pinned Repositories
llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
MQBench
Model Quantization Benchmark
awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
DeepSpeedExamples
Example models using DeepSpeed
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FasterTransformer
Transformer related optimization, including BERT, GPT
llmc
llmc is an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
Model-Compression-Research-Package
A library for researching neural networks compression and acceleration methods.
MQBench
Model Quantization Benchmark
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
ZhangYunchenY's Repositories
ZhangYunchenY/awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
ZhangYunchenY/DeepSpeedExamples
Example models using DeepSpeed
ZhangYunchenY/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
ZhangYunchenY/FasterTransformer
Transformer related optimization, including BERT, GPT
ZhangYunchenY/llmc
llmc is an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
ZhangYunchenY/Model-Compression-Research-Package
A library for researching neural networks compression and acceleration methods.
ZhangYunchenY/MQBench
Model Quantization Benchmark
ZhangYunchenY/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.