/digital-methods-text-mining

πŸ‘©β€πŸ’»πŸ‘ΎπŸDigital Methods for Analysing Texts

Primary LanguageJupyter NotebookMIT LicenseMIT

Digital Methods for Analysing Texts

This course familiarises PhD students with the main text mining techniques in social science and develops basic skills in digital methods. After completion you are familiar with the theoretical and methodological underpinnings of natural language processing perspective and are able to conduct a basic text analysis. Throughout the course we will focus on applying text analysis to empirical data, where possible related to the students own research.

Students will become familiar with digital methods in text analysis as a flexible approach that comes with a practical set of research instruments to empirically investigate a range of questions in social science. They will learn how to approach and manage text data, analyse texts, and visualize this information.

Course schedule

Session date Session Lecture Topic Seminar topic
12 April 1 Introduction to text mining Import text data
14 April 2 Analysing text Methods for text preprocessing
19 April 3 Analysing words Methods for word analysis
21 April 4 Topic modelling Methods for analysing topics
26 April 5 NLP Ethics + live coding Biases and real example analysis
28 April 6 Text mining in the real world Analysing your own text

Eligibility

You must be a PhD student at King’s, Queen Mary or Imperial, and you must have already registered as a LISS DTP student via the following link: https://www.liss-dtp.ac.uk/registration/.

Reading list

1. Intro

πŸ“ Turing, A.M. and Haugeland, J., 1950. Computing machinery and intelligence. The Turing Test: Verbal Behavior as the Hallmark of Intelligence, pp.29-56.

πŸ“ Weizenbaum, J., 1966. ELIZAβ€”a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), pp.36-45.

πŸ“ Hutchins, W.J., 2004, September. The Georgetown-IBM experiment demonstrated in January 1954. In Conference of the Association for Machine Translation in the Americas (pp. 102-114). Springer, Berlin, Heidelberg.

🌍 https://www.ibm.com/ibm/history/exhibits/701/701_translator.html

πŸ“ Bender, E.M., Hovy, D. and Schofield, A., 2020, July. Integrating ethics into the NLP curriculum. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts (pp. 6-9).

2. Analysing Text

πŸ“ Friedl, J.E., 2006. Mastering regular expressions. " O'Reilly Media, Inc.". [Introduction]

πŸ“ Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 4 and 5]

πŸ“ Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 3 and 7]

3. Analysing words

πŸ“ Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

πŸ“ Rong, X., 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.

πŸ“ Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 9]

4. Topic modelling

πŸ“ Blei, D.M., Ng, A.Y. and Jordan, M.I., 2003. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), pp.993-1022.

πŸ“ Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 7]

πŸ“ Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 6]

5. NLP Ethics

πŸ“ Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021, March. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623).

πŸ“ Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V. and Kalai, A.T., 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.

πŸ“ Caliskan, A., Bryson, J.J. and Narayanan, A., 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), pp.183-186.

πŸ“ Garg, N., Schiebinger, L., Jurafsky, D. and Zou, J., 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), pp.E3635-E3644.

6. Text mining in the real world

πŸ“ Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 12]

πŸ“ Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 11].

General Bibliography

πŸ“• Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python: Enabling language-aware data products with machine learning. O'Reilly Media, Inc.

πŸ“• Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".

πŸ“• Eisenstein, J. (2018). Natural language processing.

πŸ“• Hovy, D. (2020). Text Analysis in Python for Social Scientists: Discovery and Exploration. Cambridge University Press.

πŸ“• Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT press.

🌍 https://course.spacy.io

🌍 https://www.nltk.org

Course featured by:

Foo