inference-optimization

There are 31 repositories under inference-optimization topic.

google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C1.8k 53 221346
alibaba/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Language:C++802 36 234160
jiazhihao/TASO
The Tensor Algebra SuperOptimizer for Deep Learning
Language:C++688 24 7890
Oulu-IMEDS/pytorch_bn_fusion
Batch normalization fusion for PyTorch
Language:Python194 8 629
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Language:C++191 8 2431
ZFTurbo/Keras-inference-time-optimizer
Optimize layers structure of Keras model to reduce computation time
Language:Python157 8 2318
Rapternmn/PyTorch-Onnx-Tensorrt
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Language:Python80 6 1418
keli-wen/AGI-Study
The blog, read report and code example for AGI/LLM related knowledge.
Language:Python13 3 00
lmaxwell/Armednn
cross-platform modular neural network inference library, small and efficient
Language:C++13 3 42
ksm26/Efficiently-Serving-LLMs
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
Language:Jupyter Notebook9 1 03
Harly-1506/Faster-Inference-yolov8
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
Language:Python7 2 11
grazder/template.cpp
[WIP] A template for getting started writing code using GGML
Language:C++6 1 00
Bisonai/ncnn
Modified inference engine for quantized convolution using product quantization
Language:C++4 2 0
effrosyni-papanastasiou/constrained-em
A constrained expectation-maximization algorithm for feasible graph inference.
Language:Jupyter Notebook4 2 00
sjlee25/batch-partitioning
Batch Partitioning for Multi-PE Inference with TVM (2020)
Language:Python4 1 00
zhliuworks/Fast-MobileNetV2
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
Language:Cuda3 1 00
amazon-science/mlp-rank-pruning
MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.
Language:Python2 2 01
cedrickchee/pytorch-mobile-android
PyTorch Mobile: Android examples of usage in applications
Language:Java2 1 01
kiritigowda/mivisionx-inference-analyzer
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
Language:Python2 1 23
aalbaali/LieBatch
Batch estimation on Lie groups
Language:MATLAB1 0 11
cedrickchee/pytorch-mobile-ios
PyTorch Mobile: iOS examples
Language:Swift1 1 0
piotrostr/infer-trt
Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.
Language:Python1 2 00
Wb-az/YOLOv8-Image-detection
YOLOV8 - Object detection
Language:Jupyter Notebook1 1 02
ankdeshm/inference-optimization
A compilation of various ML and DL models and ways to optimize the their inferences.
Language:Jupyter Notebook0 1 00
goshaQ/inference-optimizer
A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
Language:Python0 2 01
manickavela29/EmoTwitter
OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset
Language:Jupyter Notebook0 1 00
prabhath-r/Enhancing-BERT-for-NLP-Tasks
Improving Natural Language Processing tasks using BERT-based models
Language:Jupyter Notebook0 2 00
booyasatoshi/quantum-annealer
This is research into optimizing the training and inference for AI models on CPUs using simulated quantum annealing algorithms
Language:Python
ieee820/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language:C++0 0
matteo-stat/transformers-nlp-multi-label-classification
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
Language:Python2 0
matteo-stat/transformers-nlp-ner-token-classification
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
Language:Python

inference-optimization

google/XNNPACK

alibaba/BladeDISC

jiazhihao/TASO

Oulu-IMEDS/pytorch_bn_fusion

mit-han-lab/inter-operator-scheduler

ZFTurbo/Keras-inference-time-optimizer

Rapternmn/PyTorch-Onnx-Tensorrt

keli-wen/AGI-Study

lmaxwell/Armednn

ksm26/Efficiently-Serving-LLMs

Harly-1506/Faster-Inference-yolov8

grazder/template.cpp

Bisonai/ncnn

effrosyni-papanastasiou/constrained-em

sjlee25/batch-partitioning

zhliuworks/Fast-MobileNetV2

amazon-science/mlp-rank-pruning

cedrickchee/pytorch-mobile-android

kiritigowda/mivisionx-inference-analyzer

aalbaali/LieBatch

cedrickchee/pytorch-mobile-ios

piotrostr/infer-trt

Wb-az/YOLOv8-Image-detection

ankdeshm/inference-optimization

goshaQ/inference-optimizer

manickavela29/EmoTwitter

prabhath-r/Enhancing-BERT-for-NLP-Tasks

booyasatoshi/quantum-annealer

ieee820/ncnn

matteo-stat/transformers-nlp-multi-label-classification

matteo-stat/transformers-nlp-ner-token-classification