Word frequencies for all the texts (~10,600) in as is
and normalized versions
-
1gram
: forlder with word frequencies for each text (as is—no normalization of orthography); -
1gram_NRM
: folder with word frequencies for each text (normalized—alifs simplified into one form; carriers of medial and finals hamzaŧs removed); -
1gram_NRM_Lengths
: file with lengths of each text in words