lookup

still unclear / set aside for now

  • token unigrams/bigrams (= n-grams?)
  • CQA analysis (reference does not contain string "CQA")
  • hierarchical softmax (quora)
  • skip bigram model (bigram (wiki) "Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar)."?)
  • distributed representation (quora)
  • AUR[OC]
  • Kolmogorov–Smirnov test
  • CDF
    • = Cumulative distribution function
    • wiki
  • p, Cohen's d

NLP

  • distributional semantics
    • wiki
      • linguistic items with similar distributions have similar meanings
      • i.e. words that are used and occur in the same contexts tend to purport similar meanings
      • i.e. a word is characterized by the company it keeps
    • Word embedding (wiki)
      • "words or phrases from the vocabulary are mapped to vectors of real numbers"
      • "Methods to generate this mapping include [...] explicit representation in terms of the context in which words appear."
  • word sense disambiguation
    • wiki
      • "identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings"
  • POS tags
    • Part-of-speech tagging (wiki)
      • aka grammatical tagging, word-category disambiguation
      • simplified: "identification of words as nouns, verbs, adjectives, adverbs, etc."

ML

  • support vector machine

Measures

  • Fleiss' kappa
    • wiki
      • "statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items"
  • f-score
    • wiki
      • "can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0"
  • cross-validation
  • Levenshtein distance
    • wiki
      • "the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other"
      • = InfoRet "edit distance"?

Network analysis

  • Louvain Modularity
    • wiki
    • "method to extract communities from large networks"

Software etc.