The Entity Synthetic Dataset is a multi-speaker multi-locale (en-*) TTS synthetic dataset for entities collected from NELL and Yago for paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems".
Please use the dataset for research or non-commercial purpose.
The dataset is available both on OneDrive and BaiduCloud with scripts in txt files and synthetic audio in zip files. Please select either the resource according to your convenience.
August 2022: update entity synthetic dataset and examples.
[1] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling, “Neverending learning,” in Proc. AAAI, 2015.
[2] T. P. Tanon, G. Weikum, and F. Suchanek, “Yago 4: A reasonable knowledge base,” in Extended Semantic Web Conference, 2020.