/AutoCorpus

AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.

Primary LanguageC++GNU Affero General Public License v3.0AGPL-3.0

Stargazers