Paddle-SVHN

This project reproduces Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks based on the paddlepaddle framework and participates in the Baidu paper reproduction competition.)

1 Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

Paper:

[1] Goodfellow I J, Bulatov Y, Ibarz J, et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks[J]. arXiv preprint arXiv:1312.6082, 2013.

Reference project

https://github.com/potterhsu/SVHNClassifier-PyTorch

The link of aistudio:

AI Studio: https://aistudio.baidu.com/aistudio/projectdetail/2598446?contributionType=1

2 Results_Compared

Methods	Model Download	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
Pytorch_SVHN	torch_model	512	0.16	100	625	0.9	~1700	95.65%
Paddle_SVHN	paddle_model (Extraction_code: v4yj)	1024	0.01	100	625	0.9	~1700	95.71%

3 Dataset

SVHN Dataset format 1
- test.tar.gz
- train.tar.gz
- extra.tar.gz

4 Recommended environment

Python 3.6
paddlepaddle-gpu 2.0.2
visdom
protobuf
lmdb

5 Start

step1: Clone

git clone https://github.com/JennyVanessa/Paddle-SVHN.git
cd Paddle-SVHN

step2: Pip installl

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

step3: Download dataset

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

step4: Convert Dataset to Lmdb format

$ python convert_to_lmdb.py --data_dir ./data

step5: Train

Save training log in train.log file and save trained model in ./logs directory

$ python train.py --data_dir ./data --logdir ./logs >> train.log

The output is:

data/test.lmdb
Start training
=> 2021-11-01 16:47:58.488561: step 100, loss = 7.821150, learning_rate = 0.100000 (2290.5 examples/sec)
=> 2021-11-01 16:48:43.666231: step 200, loss = 7.897614, learning_rate = 0.100000 (2284.5 examples/sec)
=> 2021-11-01 16:49:30.083858: step 300, loss = 7.818493, learning_rate = 0.100000 (2293.1 examples/sec)
=> 2021-11-01 16:50:15.438407: step 400, loss = 7.806008, learning_rate = 0.100000 (2276.1 examples/sec)
=> 2021-11-01 16:51:02.383038: step 500, loss = 7.821648, learning_rate = 0.100000 (2284.2 examples/sec)
=> 2021-11-01 16:51:47.870291: step 600, loss = 7.811975, learning_rate = 0.100000 (2269.1 examples/sec)
=> 2021-11-01 16:52:34.556187: step 700, loss = 7.864832, learning_rate = 0.100000 (2283.2 examples/sec)
=> 2021-11-01 16:53:20.091155: step 800, loss = 7.786717, learning_rate = 0.090000 (2266.6 examples/sec)
=> 2021-11-01 16:54:06.938361: step 900, loss = 7.849339, learning_rate = 0.090000 (2278.9 examples/sec)
=> 2021-11-01 16:54:52.568350: step 1000, loss = 7.795635, learning_rate = 0.090000 (2261.9 examples/sec)
=> Model saved to file: ./logs/model-1000.pdparams
=> patience = 100
=> Evaluating on validation dataset...
==> accuracy = 0.022880, best accuracy 0.000000
...

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pdparams

step6: Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pdparams

The output is:

Start evaluating
Evaluate /home/aistudio/logs/model-359000.pdparams on /home/aistudio/data/test.lmdb, accuracy = 0.953551
Done

step7: Infer

$ python infer.py --checkpoint=./logs/model-100.pdparams ./image/test1.png

The test1.png shows:

The infer output is:

length: 2
digits: 7 5 10 10 10

step8: Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

6 Code Structure

├─convert_to_lmdb.py                         
├─dataset.py                
├─eval.py                           
├─model.py    
├─evaluator.py
├─draw_bbox.py
├─example_pb2.py
├─infer.py
├─read_lmdb_sample.py
├─visiualize.py
├─train.py
├─train.log
├─images                          
│  test1.png