/german-reddit

Extraction of a German Reddit Corpus

Primary LanguagePythonMIT LicenseMIT

german-reddit

Extraction of a German Reddit Corpus

References

Barbaresi, Adrien (2015). Collection, Description, and Visualization of the German Reddit Corpus, in Proceedings of the 2nd Workshop on Natural Language Processing for Computer-Mediated Communication, pp. 7-11, German Society for Computational Linguistics & Language Technology.

Tools released for the NLP 4 CMC workshop.

Requirements

The whole Reddit corpus is available from archive.org

Requirements: