yiliu30

Talk is cheap, pick one and do it.

AI Frameworks Engineer @IntelSH

Pinned Repositories

neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Language:Python0 0 00
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++0 0 00
pytorch-fork
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python00
torchao-fork
The torchao repository contains api's and workflows for quantization and pruning gpu models.
Language:Python0 0 00
torchutils
Torch helper functions
Language:Python0 1 00

yiliu30's Repositories

yiliu30/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python0 0 00
yiliu30/neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Language:Python0 0 00
yiliu30/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++0 0 00
yiliu30/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python0 0 00
yiliu30/torchao-fork
The torchao repository contains api's and workflows for quantization and pruning gpu models.
Language:Python0 0 00
yiliu30/torchutils
Torch helper functions
Language:Python0 1 00
yiliu30/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
Language:Python0 0
yiliu30/ai-pr-reviewer
AI-based Pull Request Summarizer and Reviewer with Chat Capabilities.
Language:TypeScript0 0
yiliu30/auto-awq-fork
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python0 0
yiliu30/auto-round
SOTA Weight-only Quantization Algorithm for LLMs
Language:Python0 0
yiliu30/AutoGPTQ-fork
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
yiliu30/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
Language:C++0 0
yiliu30/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python0 0
yiliu30/hqq-fork
Official implementation of Half-Quadratic Quantization (HQQ)
Language:Python0 0
yiliu30/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:C++0 0
yiliu30/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Language:Python0 0
yiliu30/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python0 0
yiliu30/ml-engineering
Machine Learning Engineering Open Book
yiliu30/nn-zero-to-hero
Neural Networks: Zero to Hero
Language:Jupyter Notebook0 0
yiliu30/notes
Language:Python
yiliu30/optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Language:Python0 0
yiliu30/subclass_zoo
Language:Jupyter Notebook0 0
yiliu30/tgi
Large Language Model Text Generation Inference
Language:Python0 0
yiliu30/Torch-Fx-Graph-Visualizer
Visualizer for neural network, deep learning and machine learning models
Language:JavaScript0 0
yiliu30/training-operator
Training operators on Kubernetes.
Language:Go0 0
yiliu30/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python0 0
yiliu30/tutorials
PyTorch tutorials.
Language:Python0 0
yiliu30/xTuring
Easily build, customize and control your own LLMs
Language:Python0 0
yiliu30/yi
1 0
yiliu30/yiliu30.github.io.tmp
Language:HTML1 0

yiliu30

Pinned Repositories

neural-compressor

oneDNN

pytorch-fork

torchao-fork

torchutils

yiliu30's Repositories

yiliu30/DeepSpeed

yiliu30/neural-compressor

yiliu30/oneDNN

yiliu30/pytorch

yiliu30/torchao-fork

yiliu30/torchutils

yiliu30/accelerate

yiliu30/ai-pr-reviewer

yiliu30/auto-awq-fork

yiliu30/auto-round

yiliu30/AutoGPTQ-fork

yiliu30/gemma.cpp

yiliu30/gpt-fast

yiliu30/hqq-fork

yiliu30/intel-extension-for-transformers

yiliu30/ipex-llm

yiliu30/marlin

yiliu30/ml-engineering

yiliu30/nn-zero-to-hero

yiliu30/notes

yiliu30/optimum-habana

yiliu30/subclass_zoo

yiliu30/tgi

yiliu30/Torch-Fx-Graph-Visualizer

yiliu30/training-operator

yiliu30/transformers

yiliu30/tutorials

yiliu30/xTuring

yiliu30/yi

yiliu30/yiliu30.github.io.tmp