/TDT4310

Course page for TDT4310 Intelligent Text Analytics and Language Understanding, spring 2024.

Primary LanguageJupyter Notebook

TDT 4310 - Intelligent Text Analytics and Language Understanding - Spring 2024

Throughout this course, we will explore many aspects of natural language processing, starting with the very latest developments within language models - specifically large language models. From there on, we go back to learn more fundamental topics such as part-of-speech tagging, grammars, dependency parsing and tasks like sentiment analysis and topic modeling.

All labs will be provided as Jupyter Notebooks (.ipynb). The first lab will only consist of questions-answers in markdown-cells, to get familiar with the format. The remaining labs will require you to properly use the environment with a mix of markdown and code cells.

You must pass all labs to be eligible for the exam.

🔧 Lab setup

Each lab will have files starting with the prefix lab{N}, ${N} \in {1, 2, 3, 4, 5}$. Each lab will have at least two files:

  • lab{N}_description.md - a description of the lab
  • lab{N}_exercises.ipynb - the main notebook with the exercises
    • you will submit this file to blackboard

📝 Delivery

By the deadline for each lab, you will submit your lab{N}_exercises_{your-username}.ipynb file to Blackboard. You can submit as many times as you want - only the last submission will be considered.

📆 Schedule

Lab Link Published Deadline Topic  Libraries Chapters
1 Lab1 Jan. 8 Jan. 22 Large language models transformers -
2 Lab2 Jan. 22 Feb. 5 Tokenization, introduction to word vectors and language modeling  NLTK 2, 3
3 Lab3 Feb. 5 Feb. 19 Part-of-speech tagging, stemming/lemmatization, TF-IDF NLTK, spaCy 4, 5, 6
4 Lab4 Feb. 19 Mar. 4 Wordnet and SentiWordNet, dependency parsing, POS chunking  spaCy, Scikit-learn 7, 8
5 Lab5 Mar. 4 Mar. 18 Unsupervised topic modeling and named entities Gensim 9, 10, 11

📚 Curriculum

The course curriculum is mostly based around the 2022 book by Ekaterina Kochmar - Getting Started with Natural Language Processing. It is available on Akademika.