Pytorch 实现中文手写汉字识别
Ubuntu: 16.04
Python: 3.5.2
PyTorch: 1.0.1 gpu
Divide the data into train and test folders. In each folder, put the images of the same class in the same sub-folder, and label them with integers. Like this:
In this project, we use a data set from train_set, test_set. Also can download it using:
wget http://www.nlpr.ia.ac.cn/databases/download/feature_data/HWDB1.1trn_gnt.zip
wget http://www.nlpr.ia.ac.cn/databases/download/feature_data/HWDB1.1tst_gnt.zip
This dataset contains 3755 classes in total.
To process it, we use a python program from a blog.
This blog also implement recognition of this dataset, but using TensorFlow.
Run command:
python3 chinese_character_rec.py [option] [param]
where options and params are:
options | type | default | help | chiose |
---|---|---|---|---|
--root | type=str | default='/home/XXX/data' | help='path to data set' | |
--mode | type=str | default='train' | choices=['train', 'validation', 'inference'] | |
--log_path | type=str | default=os.path.abspath('.') + '/log.pth' | help='dir of checkpoints' | |
--restore' | type=bool | default=True | help='whether to restore checkpoints' | |
--batch_size' | type=int | default=16 | help='size of mini-batch' | |
--image_size' | type=int | default=64 | help='resize image' | |
--epoch' | type=int | default=100 | ||
--num_class' | type=int | default=100 | choices=range(10, 3755) |
See: https://blog.csdn.net/qq_31417941/article/details/97915035