A curated list of awesome A.I. & Embedded/Mobile-devices resources, tools and more.
Looking for contributors. Submit a pull request if you have something to add :)
Please check the contribution guidelines for info on formatting and writing pull requests.
- [1512.03385] Deep Residual Learning for Image Recognition
- [1610.02357] Xception: Deep Learning with Depthwise Separable Convolutions
- [1611.05431] ResneXt: Aggregated Residual Transformations for Deep Neural Networks
- [1707.01209] Model compression as constrained optimization, with application to neural nets. Part I: general framework
- [1707.04319] Model compression as constrained optimization, with application to neural nets. Part II: quantization
- [SenSys ’16] Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables
- [IoT-App ’15] An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices
- [1707.06342] ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- [1707.01083] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- [1704.04861] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- [1706.03912] SEP-Nets: Small and Effective Pattern Networks
- [1707.04693] Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- [1602.02830] Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- [1603.05279] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- [1606.06160] DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- [CVPR'17] Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- [ICLR'17] Pruning Filters for Efficient ConvNets
- [ICLR'17] Pruning Convolutional Neural Networks for Resource Efficient Inference
- [ICLR'17] Soft Weight-Sharing for Neural Network Compression
- [ICLR'16] Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- [NIPS'16] Dynamic Network Surgery for Efficient DNNs
- [NIPS'15] Learning both Weights and Connections for Efficient Neural Networks
- [ICML'17] The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- [1412.6115] Compressing Deep Convolutional Networks using Vector Quantization
- [CVPR '16] Quantized Convolutional Neural Networks for Mobile Devices
- [ICASSP'16] Fixed-Point Performance Analysis of Recurrent Neural Networks
- [arXiv'16] Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- [ICLR'17] Loss-aware Binarization of Deep Networks
- [ICLR'17] Towards the Limit of Network Quantization
- [CVPR'17] Deep Learning with Low Precision by Half-wave Gaussian Quantization
- [1706.02393] ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- [CVPR'15] Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- [1511.06067] Convolutional neural networks with low-rank regularization
- [NIPS'14] Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- [ICLR'16] Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- [1503.02531] Distilling the Knowledge in a Neural Network
- Fce Model Compression by Distilling Knowledge from Neurons
- [1605.04614] DeepLearningKit - an GPU Optimized Deep Learning Framework for Apple's iOS, OS X and tvOS developed in Metal and Swift
- [MobiSys '17] DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
- [MobiSys '17] DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware
- [EMDL '17] MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
- [WearSys '16] DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices
- [IPSN '16] DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
- [ISCA '16] EIE: Efficient Inference Engine on Compressed Deep Neural Network
- [MobiSys '16] MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
- [MobiCASE '16] DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
- [MM '16] CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- harvardnlp/nmt-android: Neural Machine Translation on Android
- TensorFlow Android Camera Demo
- KleinYuan/Caffe2-iOS: Caffe2 on iOS Real-time Demo. Test with Your Own Model and Photos.
- MXNet Android Classification App - Image classification on Android with MXNet.
- bwasti/AICamera: Demonstration of using Caffe2 inside an Android application.
- mtmd/Mobile_ConvNet: RenderScript based implementation of Convolutional Neural Networks for Android phones
- MXNet iOS Classification App - Image classification on iOS with MXNet.
- Compile MXnet on Xcode (in Chinese) - a step-by-step tutorial of compiling MXnet on Xcode for iOS app
- KleinYuan/Caffe2-iOS: Caffe2 on iOS Real-time Demo. Test with Your Own Model and Photos.
- KimDarren/FaceCropper: Crop faces, inside of your image, with iOS 11 Vision api.
- hollance/TensorFlow-iOS-Example: Source code for my blog post "Getting started with TensorFlow on iOS"
- SaschaWillems/Vulkan: Examples and demos for the new Vulkan API
- ARM-software/vulkan-sdk: ARM Vulkan SDK
- alexhultman/libvc: Vulkan Compute for C++ (experimentation project)
General frameworks contain inference and backprop stages.
Inference frameworks contains inference stage only.
- Deep Learning in a Single File for Smart Devices — mxnet
- ARM-software/ComputeLibrary: The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies Intro
- Apple CoreML
- Microsoft Embedded Learning Library
- mil-tokyo/webdnn: Fastest DNN Execution Framework on Web Browser
- jiaxiang-wu/quantized-cnn: An efficient framework for convolutional neural networks
- Tencent/ncnn: ncnn is a high-performance neural network inference framework optimized for the mobile platform
Model convertor. More convertors please refer deep-learning-model-convertor
This part contains related course, guides and tutorials.
- Deep learning systems: UW course schedule(focused on systems design, not learning)
- Squeezing Deep Learning Into Mobile Phones
- Deep Learning – Tutorial and Recent Trends
- Efficient Convolutional Neural Network Inference on Mobile GPUs
- ARM® Mali™ GPU OpenCL Developer Guide html pdf
- Optimal Compute on ARM MaliTM GPUs
- GPU Compute for Mobile Devices
- Compute for Mobile Devices Performance focused
- Hands On OpenCL
- Adreno OpenCL Programming Guide
- Better OpenCL Performance on Qualcomm Adreno GPU