/web-languages-code

The code used to generate templates for the web-languages repo https://github.com/commoncrawl/web-languages

Primary LanguagePythonApache License 2.0Apache-2.0

web-languages-code

This repo holds the code, templates, and data associated with the web-languages dataset.

Theory

Installing, etc.

make install

License

The code in this repo is licensed under the Apache 2.0 license.

The templates are licensed CC0.

Data files (*.tsv) from mOSCAR and Wikipedia are copyright by them.