Productivity and Predictability for Measuring Morphological Complexity

Repository that contains the code for calculating the entropy rate of a subword language model. This is part of the article Productivity and Predictability for Measuring Morphological Complexity

This program runs in python 3. The program uses the next libraries:

Basic Usage

python main.py --input directory

input_directory should be a directory containing a parallel corpus, where each file corresponds to a language (each file must be already tokenized).

Corpora for the languages mentioned in the article were pre-processed and extracted from:

To run the model with different parameters, execute the program as in the following example:

python3 main.py --input directory --n 1 --iter 100