This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:
- Image classification on the ImageNet dataset
- Object detection using SSD
- Semantic segmentation using Deeplabv3
Note: Any image classification backbone can be used with object detection and semantic segmentation models
Training can be done with two samplers:
- Standard distributed sampler
- Mulit-scale distributed sampler
We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.
CVNets can be installed in the local python environment using the below command:
git clone git@github.com:apple/ml-cvnets.git
cd ml-cvnets
pip install -r requirements.txt
pip install --editable .
We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda
environment. For setting-up python environment with conda, see here.
- General instructions for training and evaluation different models are given here.
- Examples for a training and evaluating a specific model are provided in the examples folder. Right now, we support following models.
- For converting PyTorch models to CoreML, see README-pytorch-to-coreml.md.
If you find our work useful, please cite the following paper:
@article{mehta2021mobilevit,
title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
author={Mehta, Sachin and Rastegari, Mohammad},
journal={arXiv preprint arXiv:2110.02178},
year={2021}
}