/Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

Primary LanguagePython

Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

Usage

python3 parse_opensubtitle_xml.py

the above will download a zip containing the english opensubtitles corpus, and extract text from all the xml files (removes metadata)