Pinned Repositories
corpus_toolkit
Python toolkit for corpus analysis: tokenization, lexical diversity, vocabulary growth prediction, entropy measures, and Zipf/Heaps visualizations.
hayes2009
Hayes 2009 phonological feature tables
linguist_toolkit
Python and coding tools for collecting text and audio data.
morpheme_segmenter
python code that segements words into morphemes based on statistical properties of a corpus.
MorphoLex-en
Lexical database for ~70k English words with morphological variables
shannon
This project uses KenLM to analyze language entropy and redundancy in English and Linear B.
suxotin
Python script that distinguishes vowels from consonants using Suxotin's algorithm.
syllabify
Python module for syllabifying English ARPABET transcriptions
thesis
Contains the code from my thesis project.
writing_direction
This script predicts language directionality (LTR or RTL) using Gini and entropy calculations on character distributions from Europarl and UDHR corpora.
jhnwnstd's Repositories
jhnwnstd/linguist_toolkit
Python and coding tools for collecting text and audio data.
jhnwnstd/corpus_toolkit
Python toolkit for corpus analysis: tokenization, lexical diversity, vocabulary growth prediction, entropy measures, and Zipf/Heaps visualizations.
jhnwnstd/shannon
This project uses KenLM to analyze language entropy and redundancy in English and Linear B.
jhnwnstd/suxotin
Python script that distinguishes vowels from consonants using Suxotin's algorithm.
jhnwnstd/hayes2009
Hayes 2009 phonological feature tables
jhnwnstd/morpheme_segmenter
python code that segements words into morphemes based on statistical properties of a corpus.
jhnwnstd/MorphoLex-en
Lexical database for ~70k English words with morphological variables
jhnwnstd/syllabify
Python module for syllabifying English ARPABET transcriptions
jhnwnstd/writing_direction
This script predicts language directionality (LTR or RTL) using Gini and entropy calculations on character distributions from Europarl and UDHR corpora.
jhnwnstd/thesis
Contains the code from my thesis project.