Pinned Repositories
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
ADMM-NN
admm-pruning
Prune DNN using Alternating Direction Method of Multipliers (ADMM)
AI-Job-Notes
AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
algorithm
My LeetCode Solutions with Explanation and Time Complexity Analysis
Awesome-Pruning
A curated list of neural network pruning resources.
Interview-Notebook
:books: 技术面试需要掌握的基础知识整理
ml-road
Machine Learning Resources, Practice and Research
Model_Compression_Paper
Tools-NetworkModeViewer-Netron
Visualizer for neural network, deep learning and machine learning models
YongHuaZhang-BUAA's Repositories
YongHuaZhang-BUAA/bitsandbytes
LLM:8-bit CUDA functions for PyTorch
YongHuaZhang-BUAA/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
YongHuaZhang-BUAA/DeepSpeed
LLM:DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
YongHuaZhang-BUAA/ExplanationIntervention
LLM:PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind
YongHuaZhang-BUAA/FisherPruning
Finetuning:Group Fisher Pruning for Practical Network Compression(ICML2021)
YongHuaZhang-BUAA/FlanT5-CoT-Specialization
LLM:Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
YongHuaZhang-BUAA/FlexGen
LLM:FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
YongHuaZhang-BUAA/FPGA-BDF
Avnet Board Definition Files
YongHuaZhang-BUAA/gptq
LLM:Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
YongHuaZhang-BUAA/hls4ml
Machine learning on FPGAs using HLS
YongHuaZhang-BUAA/hls4ml-tutorial
Tutorial notebooks for hls4ml
YongHuaZhang-BUAA/iree
👻
YongHuaZhang-BUAA/LaMini-LM
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
YongHuaZhang-BUAA/Lion
Lion: Adversarial Distillation of Closed-Source Large Language Model
YongHuaZhang-BUAA/llm-awq
LLM:AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
YongHuaZhang-BUAA/LLM-Pruner
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
YongHuaZhang-BUAA/lm-evaluation-harness
LLM:A framework for few-shot evaluation of autoregressive language models.
YongHuaZhang-BUAA/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
YongHuaZhang-BUAA/neural-compressor
LLM:Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.
YongHuaZhang-BUAA/owq
LLM:Code for the "OWQ: Lessons learned from activation outliers for weight quantization in large language models".
YongHuaZhang-BUAA/pytorch-cifar
95.47% on CIFAR10 with PyTorch
YongHuaZhang-BUAA/qlora
LLM:QLoRA: Efficient Finetuning of Quantized LLMs
YongHuaZhang-BUAA/QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
YongHuaZhang-BUAA/RPTQ4LLM
LLM:Reorder-based post-training quantization for large language model
YongHuaZhang-BUAA/smoothquant
LLM:[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
YongHuaZhang-BUAA/Sparse-storage-formats
The codes of sparse storage formats for Vitis HLS
YongHuaZhang-BUAA/SpQR
LLM:SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
YongHuaZhang-BUAA/SqueezeLLM
LLM:SqueezeLLM: Dense-and-Sparse Quantization
YongHuaZhang-BUAA/wanda
LLM Pruning:A simple and effective LLM pruning approach.
YongHuaZhang-BUAA/XilinxBoardStore