/europarl-loglog

Plot a log-log rank vs counts plot of each language in the Europarl corpus.

Primary LanguagePython

europarl-loglog

The Europarl Corpus is a parallel corpus of transcripts from the European Parliament used for machine translation. This demo uses the corpus for showing the consistency of Zipf's Law across natural languages.

What you need

  • matplotlib
  • nltk (uses the corpus reader)
  • ft (library for manipulating lists of dictionaries: sudo pip install ft)

Notes

You can get the corpus with nltk.download.

The path is hard-coded. You'll need to edit the europarl_path variable to the corpus location.