RU_capstone
Rutgers ECE capstone(2019): Multilingual ASR data collection
Introduction
Crawl multilingual audio and text reasources from web, achieve forced alignment on those data.
There would be two part of our project, the first is Crawler, the second is Aligner.
Crawler
In this part, we achieved web crawling on two website. We crawled multilanguage audio and corresponding text data.
WordProject
WordProject is a website that provide multilingual version of Bible. Actually, it support 37 languages. The reasources from this website have a perfect match rate.
SBS News
SBS News is a news website that provide news in over 60 kinds of languages.
Aligner
In this part, we achieved forced alignment based on Montreal-Forced-Aligner and Kaldi using the data we crawled before.
Our output would be TextGrid format files.
TextGrid demo:
Video Demo
Team Member
Mo Shi, Chaoji Zuo, Ziqi Wang, Zekun Zhang, Duc Le