Word generator using N-gram probabilities written in Python using only numpy.
Based on Karpathy's bigram model shown in his makemore lectures.
Example usage:
python3 ngram.py -fdatasets/names.txt -n6 -N30 --skip-existing --show-existing |> out.txt
- -fdatasets/names.txt (use the file located at ./datasets/names.txt) - required
- -n6 (6-gram model) - required
- -N30 (generate 30 words) - required
- --skip-existing (skip word generations that already exist in the dataset) - optional
- --show-existing (display "✓" after words that already exist in the dataset) - optional
- |> out.txt (save generated words into a file, e.g. out.txt) - optional
Dependencies:
- numpy (arrays, multinomial)
- tqdm (loading bars)
Todo:
- Option to save probabilities
- Option to only count non-existing unique generations
- Automatic n-gram probability distributions generator
Thanks: