Pet Face Detector 👁 🐶 🐱

Using a RetinaNet to detect faces of common breeds of Pets.

Go to this link to preview the web app !

The model not only detects faces of the pets but also classifies the face breed of the animal.

The model has been trained on the these following breeds :

This project is built on top of :

PyTorch

PyTorchLightning

Torchvision

Albumentations

Streamlit

Dataset used:

For training the models The Oxford-IIIT Pet Dataset has been used which can be found here. Two pretrained models for detections are availabel : (RetinaNet with resnet50 backbone) and (RetinaNet with resnet34 backbone). These pretraned-models can be selected via the .ymal files present in the config/ dir.

TODO:

Parse the data and convert it to a managable format ex: CSV.
Finish Retinanet Project first.
Train the Network.
Create WebApp using StreamLit.
Notebooks & Scripts for Train.
Deploy WebApp . Link to the app : https://share.streamlit.io/benihime91/retinanet_pet_detector/app.py

Tutorials

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/benihime91/demo-nb

Usage:

Install python3

Install dependencies

$ git clone --recurse-submodules -j8 https://github.com/benihime91/retinanet_pet_detector.git
$ cd retinanet_pet_detector
$ pip install -r requirements.txt

Run app
```
$ streamlit run app.py
```

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Inference with Pre-Trained weights:

$ python inference.py \
      --config "config/resnet34.yaml"\
      --image "/content/oxford-iiit-pet/images/german_shorthaired_128.jpg" \
      --save_dir "/content/" \
      --fname "res_1.png" \

$ python inference.py \
      --config "config/resnet50.yaml"\
      --image "/content/oxford-iiit-pet/images/german_shorthaired_128.jpg" \
      --save_dir "/content/" \
      --fname "res_1.png" \

Flags:

 $ python inference.py --help
    usage: inference.py [-h] [--config CONFIG] --image IMAGE
                    [--score_thres SCORE_THRES] [--iou_thres IOU_THRES]
                    [--md MD] [--save SAVE] [--show SHOW]
                    [--save_dir SAVE_DIR] [--fname FNAME]

    optional arguments:
      -h, --help            show this help message and exit
      --config CONFIG       path to the config file
      --image IMAGE         path to the input image
      --score_thres SCORE_THRES
                            score_threshold to threshold detections
      --iou_thres IOU_THRES
                            iou_threshold for bounding boxes
      --md MD               max detections in the image
      --save SAVE           wether to save the ouput predictions
      --show SHOW           wether to display the output predicitons
      --save_dir SAVE_DIR   directory where to save the output predictions
      --fname FNAME         name of the output prediction file

Training Procedure:

Clone the Repo:

$ git clone --recurse-submodules -j8 https://github.com/benihime91/retinanet_pet_detector.git
$ cd retinanet_pet_detector

Ensure all requirements are installed. To train on GPU need to install PyTroch GPU build. Download it from here. Then commment the first 2 lines from requirements.txt. After that
```
$ pip install -r requirements.txt
```
Download the dataset from here.

After downloading the dataset . Run the references/data_utils.py to convert the xml annotations into csv file and also create train, validation and test splits.

$ python prep_data.py --help
  usage: prep_data.py [-h] [--action {create,split}] [--img_dir IMG_DIR]
                     [--annot_dir ANNOT_DIR] [--labels LABELS] [--csv CSV]
                     [--valid_size VALID_SIZE] [--test_size TEST_SIZE]
                     [--output_dir OUTPUT_DIR] [--seed SEED]

  optional arguments:
    -h, --help            show this help message and exit
    --action {create,split}
    --img_dir IMG_DIR     path to the image directory
    --annot_dir ANNOT_DIR
                          path to the annotation directory
    --labels LABELS       path to the label dictionary
    --csv CSV             path to the csv file
    --valid_size VALID_SIZE
                          size of the validation set relative to the train set
    --test_size TEST_SIZE
                          size of the test set relative to the validation set
    --output_dir OUTPUT_DIR
                          path to the output csv file
    --seed SEED           random seed

This commmand converts the xml to csv files. Change the --img_dir to the path where the dataset images are stored, --annot_dir to the path where the xml annotation are stored & --labels to where the label.names file is stored. label.names is stored in data/labels.names. The csv file will be saved in --output_dir as data-full.csv.

$ python prep_data.py.py \
    --action create \
    --img_dir "/content/oxford-iiit-pet/images" \
    --annot_dir "/content/oxford-iiit-pet/annotations/xmls" \
    --labels "/content/retinanet_pet_detector/data/labels.names" \
    --output_dir "/content/retinanet_pet_detector/data/"

Run this command to convert training, valiation and test splits.
The datasets will be saved in --output_dir as train.csv,valid.csv and test.csv.
Set the --csv argument to the path to data-full.csv generated above.
You can also set a seed by passing in the --seed argument to insure that results reproducibility.

$ python prep_data.py.py \
    --action split \
    --csv "/content/retinanet_pet_detector/data/data-full.csv"\
    --valid_size 0.3 \
    --test_size 0.5 \
    --output_dir "/content/retinanet_pet_detector/data/"
    --seed 123

Training is controlled by the main.yaml file. Before training ensures that the paths in main.yaml : ( hparams.train_csv,hparams.valid_csv,hparams.valid_csv ) are the correct paths to the files generated above.
If not training on GPU change these arguments:
- trainer.gpus = 0
- trainer.precision = 32
In the same the other flags in main.yaml can be modified.
To train run this command. The --config argument points to the path to where the main.yaml file is saved.
```
$ python train.py \
   --config "/content/retinanet_pet_detector/config/main.yaml" \
   --verbose 0 \
```
Model weights are automatically saved as state_dicts() in the filepath specifed in trainer.model_checkpoint.params.filepath in main.yaml as weights.pth

For inference modify the config/34.yaml or config/resnet50.yaml file . Set the url to be the path where the weights are saved. Example: checkpoints/weights.pth.

--config : corresponds to the path where the config/resnet34.yaml or config/resnet50.yaml file is saved.
--image : corresponds to the path of the image.
Results are saved as {save_dir}/{fname}.

$ python inference.py \
    --config "/content/retinanet_pet_detector/config/resnet50.yaml"\
    --image "/content/oxford-iiit-pet/images/german_shorthaired_128.jpg"\
    --score_thres 0.7 \
    --iou_thres 0.4 \
    --save_dir "/content/" \
    --fname "res_1.png" \

$ python inference.py \
      --config "/content/retinanet_pet_detector/config/resnet34.yaml" \
      --image "/content/oxford-iiit-pet/images/german_shorthaired_128.jpg" \
      --save_dir "/content/" \
      --fname "res_1.png" \

To view tensorboard logs:
```
$ tensorboard --logdir "logs/"
```

Results:

Results for RetinaNet model with resnet34 backbone:

[09/19 13:37:58 references.lightning]: Evaluation results for bbox: 
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.576
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.608
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.500
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.576
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.624
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.624
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628

Results for RetinaNet model with resnet50 backbone:

[09/20 12:39:13 references.lightning]: Evaluation results for bbox: 
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.600
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.979
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.604
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.606
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619