/AUC_TMCI_2021

Materials for "Text Mining", a course by the Amsterdam University College.

Primary LanguageJupyter Notebook

Text Mining 2020/21

Binder

Amsterdam University College -- Text Mining -- Winter/Spring 2021.

Contents

You can use the Hello World notebooks to check that everything is working.

Week Topic Materials
1 Introduction and Python refresher slides + notebooks 1, 2, 3, 4, 5
2 Introduction to NLP and NLP pipelines slides + notebook
3 Language modelling
4 Vector space semantics
5 Word embeddings
6 Machine learning fundamentals and PyTorch
7 Text classification
8 Advanced architectures and NER
9 Web scraping and APIs
10 Recommender systems
11 Creating annotated corpora and sentiment analysis
12 Clustering and topic modelling
13 Trendy research topics

Group projects

See the projects folder for info.

Set-up

  1. Clone the repository locally: git clone https://github.com/Giovanni1085/AUC_TMCI_2019.git
  2. Get updates (from time to time): git pull
  3. Create a conda environemnt: conda create -n myenv python=3.7 anaconda (where myenv is the envirnoment name)
  4. Activate it: conda activate myenv
  5. Install packages (see the requirements.txt file), e.g. conda install pandas
  6. Launch a Jupyter notebook: jupyter notebook

Alternatively, use Binder (link above).

A more detailed guide to setup your environment, with multiple options.

Credits

  • The previous-year edition of this course.
  • Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
  • Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School DOI
  • James Hetherington and Giovanni Colavizza, Research Software Engineering with Python

License

Everything in this repository which is not already attributed to someone else is released under CC BY 4.0.