This repo contains: AVLetters
[1], AVDigits
[2] and AVLetters2
[3] AVSR datasets. These files are all on Google cloud.
References:
[1] Matthews, Iain, et al. "Extraction of visual features for lipreading." IEEE Transactions on Pattern Analysis and Machine Intelligence 24.2 (2002): 198-213.
[2] Hu, Di, and Xuelong Li. "Temporal multimodal learning in audiovisual speech recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[3] Cox, Stephen J., et al. "The challenge of multispeaker lip-reading." AVSP. 2008.