/wordcount

hadoop wordcount

Primary LanguageJupyter Notebook

hadoop program to perform a wordcount of documentation

  1. graph runtime with varying size datasets
  2. Update WordCount to ignore punctuation and html script
  3. Count average use of everyword per year
  4. compute max and min a word appears in all addresses
  5. Compute avg and std of words in 4 year windows from 1985 // can use input | mergestuff(files)
  6. In post window years, find words whose use was greater than 2 stds (e.g. 89)