vocabulary-size
There are 2 repositories under vocabulary-size topic.
samujjwaal/CiteSeer-Text-Processing
Tokenizing text in the CiteSeer document corpus and determining the word frequencies for all the words in the collection
pablobernabeu/Language-and-vision-in-conceptual-processing-Multilevel-analysis-and-statistical-power
This incomplete repository is used to facilitate the consultation of individual files in this project. Only files smaller than 100 MB are available here. The complete project is available at http://doi.org/10.17605/OSF.IO/UERYQ.