This repository gathers is the code for homework in class. To read the detailed for what is this, please, refer to my report.
The following specs were used to create the original solution.
- Windows 10
- Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz 2.50GHz
- NVIDIA GeForce GTX 1660 Ti
To reproduct my result without retrainig, do the following steps:
See this to know how to use mmdetection. And here is how to install it.
The street view house numbers data download at here. Unzip them then you can see following structure:
street-view-house-numbers-detection/
├── train
│ ├── 1.png
│ ├── 2.png
│ │ .
│ │ .
│ │ .
│ ├── 33402.png
│ ├── digitStruct.mat
│ └── see_bboxes.m
├── test
│ ├── 1.png
│ ├── 2.png
│ │ .
│ │ .
│ │ .
│ └── 13068.png
├── to_coco_format.py
│ .
│ .
To train or inference, transfer the data fomat to coco format is required. Run following command.
$ python to_coco_format.py
then there is some file in preprocess_file folder like this
street-view-house-numbers-detection/train/
├── 1.png
├── 2.png
│ .
│ .
│ .
├── 33402.png
├── digitStruct.mat
├── see_bboxes.m
├── train_data_processed.json
├── val_data_processed.json
└── error_data_processed.json
To train models, run following command.
$ python tool/train.py config/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.py
You can download pretrained model that used for my submission from link. Or run following commands.
Note! there is no default unzip command in windows 10, you must unzip by GUI.
$ wget https://drive.google.com/file/d/1XkKurOTGL-PFJfLlPGpkKmwGintVcT04/view?usp=sharing
$ unzip model_wide_resnet.zip
Unzip them and put then in structure:
street-view-house-numbers-detection/work_dirs/
└── faster_rcnn_r34_fpn_1x_coco
└── latest.pth
If trained weights are prepared, you can create a file containing the bboxes for each picture in test set.
Using the pre-trained model, enter the command:
$ python inference_demo.py
And you can see the 0856735_xx.json in result folder