kunansy/RNC

Parsing docs or examples to Multiprocessing

Opened this issue · 2 comments

Parsing in MultilingualParaCorpus gets a lot of time.
Which of the parse method to Multiprocessing: parse_doc or parse_example? Profile the project to know.

If the Multiprocessing gets more time, use it only in MultilingualParaCorpus .

Move parse_page to Multiprocessing in the main Corpus class.
Set count of processes according to count of CPU cores.