Pinned Repositories
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ao
The torchao repository contains api's and workflows for quantization and pruning gpu models.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
yiliu30's Repositories
yiliu30/ao
The torchao repository contains api's and workflows for quantization and pruning gpu models.
yiliu30/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
yiliu30/neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
yiliu30/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
yiliu30/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
yiliu30/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
yiliu30/ai-pr-reviewer
AI-based Pull Request Summarizer and Reviewer with Chat Capabilities.
yiliu30/auto-round
SOTA Weight-only Quantization Algorithm for LLMs
yiliu30/awesome-model-quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
yiliu30/CodeXGLUE
CodeXGLUE
yiliu30/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
yiliu30/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
yiliu30/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
yiliu30/hugo-PaperMod
A fast, clean, responsive Hugo theme.
yiliu30/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
yiliu30/mpi-operator
Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
yiliu30/nn-zero-to-hero
Neural Networks: Zero to Hero
yiliu30/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
yiliu30/optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
yiliu30/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
yiliu30/py-style-test
yiliu30/subclass_zoo
yiliu30/Test
yiliu30/tgi
Large Language Model Text Generation Inference
yiliu30/Torch-Fx-Graph-Visualizer
Visualizer for neural network, deep learning and machine learning models
yiliu30/training-operator
Training operators on Kubernetes.
yiliu30/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
yiliu30/tutorials
PyTorch tutorials.
yiliu30/xTuring
Easily build, customize and control your own LLMs
yiliu30/yiliu30