/ngram-wordgen

Word generator using n-gram probabilities

Primary LanguageJupyter NotebookMIT LicenseMIT

Word generator using N-gram probabilities

Word generator using N-gram probabilities written in Python using only numpy.

Based on Karpathy's bigram model shown in his makemore lectures.

Example usage:

python3 ngram.py -fdatasets/names.txt -n6 -N30 --skip-existing --show-existing |> out.txt

  • -fdatasets/names.txt (use the file located at ./datasets/names.txt) - required
  • -n6 (6-gram model) - required
  • -N30 (generate 30 words) - required
  • --skip-existing (skip word generations that already exist in the dataset) - optional
  • --show-existing (display "✓" after words that already exist in the dataset) - optional
  • |> out.txt (save generated words into a file, e.g. out.txt) - optional


Dependencies:

  • numpy (arrays, multinomial)
  • tqdm (loading bars)

Todo:

  • Option to save probabilities
  • Option to only count non-existing unique generations
  • Automatic n-gram probability distributions generator

Thanks: