/corpus

A big text file full of Toaq writings + tools to clean/analyze it

Primary LanguagePython

toaq-corpus.txt was compiled by hand-copying #toaq-only Discord logs, #kaise stories (Hoaqgio's stories, Ilmen/xorxes's translations), and the extracted text from Seoqrea's translation of Májīpōq roı Rúaı Ỉomā into one file.

frequency.py will spit out a frequency list of official and unofficial words.