- token unigrams/bigrams (= n-grams?)
- CQA analysis (reference does not contain string "CQA")
- hierarchical softmax (quora)
- skip bigram model (bigram (wiki) "Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar)."?)
- distributed representation (quora)
- AUR[OC]
- Kolmogorov–Smirnov test
- CDF
- = Cumulative distribution function
- wiki
- p, Cohen's d
- distributional semantics
- wiki
- linguistic items with similar distributions have similar meanings
- i.e. words that are used and occur in the same contexts tend to purport similar meanings
- i.e. a word is characterized by the company it keeps
- Word embedding (wiki)
- "words or phrases from the vocabulary are mapped to vectors of real numbers"
- "Methods to generate this mapping include [...] explicit representation in terms of the context in which words appear."
- wiki
- word sense disambiguation
- wiki
- "identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings"
- wiki
- POS tags
- Part-of-speech tagging (wiki)
- aka grammatical tagging, word-category disambiguation
- simplified: "identification of words as nouns, verbs, adjectives, adverbs, etc."
- Part-of-speech tagging (wiki)
- support vector machine
- Fleiss' kappa
- wiki
- "statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items"
- wiki
- f-score
- wiki
- "can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0"
- wiki
- cross-validation
- Levenshtein distance
- wiki
- "the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other"
- = InfoRet "edit distance"?
- wiki
- Louvain Modularity
- wiki
- "method to extract communities from large networks"