/DHBenelux15

Primary LanguageJavaScript

#DH Benelux 2015: A method for cleaning 19th century text with examples from Transactions of the Royal Irish Academy 1800-1899

MONDAY 8 JUNE 2015 15.30 – 16.30 Parallel Paper Sessions C

Link to Abstract

Talk outline:

  1. Introduction to the Journal
    1. History of the RIA
    2. Irish Context
    3. Statistics about the corpus
  2. Getting from the texts to .txts
    1. JSTOR & OCR
    2. Complications: Typographical
    3. Complications: Linguistic
    4. Complications: Layout
  3. Topic Modelling the Journals
    1. Cleanup Process
      1. Stop Words
      2. Spell checking
  4. Results: the Topics and what we learned about the RIA
  5. Results: a Process for cleaning text someone else has OCR'd
  6. Conclusions