We introduce the HME100K dataset, a large-scale and real scene dataset suitable for evaluating handwritten mathematical expression recognition task. SAN code is available at here.
The data were collected from tens of thousands of writers who wrote the MEs on papers and uploaded them to an internet application.
You can download the dataset from the official website: https://ai.100tal.com/dataset
HME100K
|
|---train
| |---train_images
| |
| |---train_labels.txt
|
|---test
| |---test_images
| |
| |---test_labels.txt
|
|---subset
|---easy.json
|
|---medium.json
|
|---hard.json
If you find this dataset helpful for your research, please cite the following paper:
@article{yuan2022syntax,
title={Syntax-Aware Network for Handwritten Mathematical Expression Recognition},
author={Yuan, Ye and Liu, Xiao and Dikubab, Wondimu and Liu, Hui and Ji, Zhilong and Wu, Zhongqin and Bai, Xiang},
journal={arXiv preprint arXiv:2203.01601},
year={2022}
}