tairenpiao's Stars
Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
intel/auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
huggingface/optimum-quanto
A pytorch quantization backend for optimum
ollama/ollama
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
meta-llama/llama3
The official Meta Llama 3 GitHub site
Nota-NetsPresso/netspresso-trainer
A library for training, compressing and deploying computer vision models (including ViT) with edge devices
OpenBMB/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
ZhangGe6/onnx-modifier
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
daquexian/onnx-simplifier
Simplify your onnx model
bazelbuild/bazel
a fast, scalable, multi-language and extensible build system
django/django
The Web framework for perfectionists with deadlines.
quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
PINTO0309/spo4onnx
Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by several tens of percent. In particular, models containing Einsum and OneHot.
PINTO0309/onnx2tf
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
lutzroeder/netron
Visualizer for neural network, deep learning and machine learning models
rust-lang/rust
Empowering everyone to build reliable and efficient software.
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
triton-lang/triton
Development repository for the Triton language and compiler
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
optuna/optuna
A hyperparameter optimization framework
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.