This project reproduces Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks based on the paddlepaddle framework and participates in the Baidu paper reproduction competition.)
The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.
- [1] Goodfellow I J, Bulatov Y, Ibarz J, et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks[J]. arXiv preprint arXiv:1312.6082, 2013.
Methods | Model Download | Batch Size | Learning Rate | Patience | Decay Step | Decay Rate | Training Speed (FPS) | Accuracy |
---|---|---|---|---|---|---|---|---|
Pytorch_SVHN | torch_model | 512 | 0.16 | 100 | 625 | 0.9 | ~1700 | 95.65% |
Paddle_SVHN | paddle_model (Extraction_code: v4yj) | 1024 | 0.01 | 100 | 625 | 0.9 | ~1700 | 95.71% |
- SVHN Dataset format 1
- test.tar.gz
- train.tar.gz
- extra.tar.gz
- Python 3.6
- paddlepaddle-gpu 2.0.2
- visdom
- protobuf
- lmdb
git clone https://github.com/JennyVanessa/Paddle-SVHN.git
cd Paddle-SVHN
Install paddle following the official tutorial.
pip install visdom
pip install h5py
pip install protobuf
pip install lmdb
-
Download SVHN Dataset format 1
-
Extract to data folder, now your folder structure should be like below:
SVHNClassifier - data - extra - 1.png - 2.png - ... - digitStruct.mat - test - 1.png - 2.png - ... - digitStruct.mat - train - 1.png - 2.png - ... - digitStruct.mat
$ python convert_to_lmdb.py --data_dir ./data
Save training log in train.log file and save trained model in ./logs directory
$ python train.py --data_dir ./data --logdir ./logs >> train.log
The output is:
data/test.lmdb
Start training
=> 2021-11-01 16:47:58.488561: step 100, loss = 7.821150, learning_rate = 0.100000 (2290.5 examples/sec)
=> 2021-11-01 16:48:43.666231: step 200, loss = 7.897614, learning_rate = 0.100000 (2284.5 examples/sec)
=> 2021-11-01 16:49:30.083858: step 300, loss = 7.818493, learning_rate = 0.100000 (2293.1 examples/sec)
=> 2021-11-01 16:50:15.438407: step 400, loss = 7.806008, learning_rate = 0.100000 (2276.1 examples/sec)
=> 2021-11-01 16:51:02.383038: step 500, loss = 7.821648, learning_rate = 0.100000 (2284.2 examples/sec)
=> 2021-11-01 16:51:47.870291: step 600, loss = 7.811975, learning_rate = 0.100000 (2269.1 examples/sec)
=> 2021-11-01 16:52:34.556187: step 700, loss = 7.864832, learning_rate = 0.100000 (2283.2 examples/sec)
=> 2021-11-01 16:53:20.091155: step 800, loss = 7.786717, learning_rate = 0.090000 (2266.6 examples/sec)
=> 2021-11-01 16:54:06.938361: step 900, loss = 7.849339, learning_rate = 0.090000 (2278.9 examples/sec)
=> 2021-11-01 16:54:52.568350: step 1000, loss = 7.795635, learning_rate = 0.090000 (2261.9 examples/sec)
=> Model saved to file: ./logs/model-1000.pdparams
=> patience = 100
=> Evaluating on validation dataset...
==> accuracy = 0.022880, best accuracy 0.000000
...
Retrain if you need
$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pdparams
$ python eval.py --data_dir ./data ./logs/model-100.pdparams
The output is:
Start evaluating
Evaluate /home/aistudio/logs/model-359000.pdparams on /home/aistudio/data/test.lmdb, accuracy = 0.953551
Done
$ python infer.py --checkpoint=./logs/model-100.pdparams ./image/test1.png
The test1.png shows:
The infer output is:
length: 2
digits: 7 5 10 10 10
$ rm -rf ./logs
or
$ rm -rf ./logs_retrain
├─convert_to_lmdb.py
├─dataset.py
├─eval.py
├─model.py
├─evaluator.py
├─draw_bbox.py
├─example_pb2.py
├─infer.py
├─read_lmdb_sample.py
├─visiualize.py
├─train.py
├─train.log
├─images
│ test1.png