This workshop was originally prepared for the 2015 Digital Humanities @ Berkeley Summer Institute. It has since been taught elsewhere.
This course introduces students to modern quantitative text analysis techniques, with the ultimate goal of providing the skills necessary to apply the methods in their own research. We will use the open source programming language R
. Demonstration corpora are provided.
- Acquiring and Preprocessing texts
- Discriminating Words
- Dictionary Methods and Sentiment Analysis
- The Vector Space Model and the Geometry of Text (Multi-dimensional Scaling, Most Similar Texts, Clustering)
- Topic Models
- Quantifying Style: Grammar, Alliteration, and other Poetic Concerns
See the entire syllabus here.
This workship will be using the R programming language. See the software requirements here.
Students are strongly encouraged to complete this brief tutorial to learn the basic syntax of the R programming language.