lookup

token unigrams/bigrams (= n-grams?)
CQA analysis (reference does not contain string "CQA")
hierarchical softmax (quora)
skip bigram model (bigram (wiki) "Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar)."?)
distributed representation (quora)
AUR[OC]
- stats.stackexchange.com
- eli5
Kolmogorov–Smirnov test
- wiki
CDF
- = Cumulative distribution function
- wiki
p, Cohen's d

distributional semantics
- wiki
  - linguistic items with similar distributions have similar meanings
  - i.e. words that are used and occur in the same contexts tend to purport similar meanings
  - i.e. a word is characterized by the company it keeps
- Word embedding (wiki)
  - "words or phrases from the vocabulary are mapped to vectors of real numbers"
  - "Methods to generate this mapping include [...] explicit representation in terms of the context in which words appear."
word sense disambiguation
- wiki
  - "identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings"
POS tags
- Part-of-speech tagging (wiki)
  - aka grammatical tagging, word-category disambiguation
  - simplified: "identification of words as nouns, verbs, adjectives, adverbs, etc."

Fleiss' kappa
- wiki
  - "statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items"
f-score
- wiki
  - "can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0"
cross-validation
- chunk data and rotate chunks for traning/validation
- wiki
Levenshtein distance
- wiki
  - "the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other"
  - = InfoRet "edit distance"?

IllDepence/websci_ss2017