These lab sessions are designed to help you follow along with the contents presented during the lectures, and introduce you to the skills and tools needed to complete the final projects.
The lab sessions will be a mix of tutorials and exercises. The tutorials will present modern frameworks and tools to implement advanced NLP analyses and pipelines. The exercises are designed to teach you the skills needed for final projects. Here is a brief overview of the schedule:
Some notes:
-
The core contents are covered in the first few weeks of the course to kickstart your work. Exercise sessions are dropped from week 6 onwards to allow you to focus on the final project.
-
Participation to the lab sessions is highly encouraged, as they cover fundamental notions for the assignment portfolios and the final projects. Instructors will be available to answer questions and provide guidance.
The lab sessions make use of the Jupyter environment. You can use the following links to get started:
Alternatively, it is possible to use the notebooks via the Google Colab web environment simply by clicking on the button at the beginning of each notebook. If you’re running on Windows, we recommend following along using a Colab notebook. If you’re using a Linux distribution or macOS, you can use either approach described here. For an intro to the Colab environment, refer to:
Since the lab session will introduce you to OSS libraries such as spaCy, Stanza, Scikit-learn, 🤗 Transformers and 🤗 Datasets, most of the first few sessions' contents are adapted from official tutorials and docs. Here is a non-exhaustive list of the most relevant sources for additional reference:
- Advanced NLP with spaCy
- Stanza tutorials
- spaCy Linguistic Features
- HuggingFace Course, Chapter 1
- HuggingFace Transformers Docs
- HuggingFace Datasets Docs
- Scikit-learn "Working with Text Data" Tutorial
- NLP class materials by Dirk Hovy
- HuggingFace "How to Generate" Tutorial
- A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
- HuggingFace PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
The file requirements.txt
in this repository contains the list of all the packages required to run the lab sessions. You can create a Python virtual environment (Python>=3.6) and install them using the following command:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Make sure the virtual environment is activated before running Jupyter. If you are using Colab, simply run the cell at the beginning of each notebook to install the required packages. Refer to Using a Python Virtual Environment for more details on how to create and activate a virtual environment. Alternatively, you can use Poetry to manage the dependencies.
For any troubleshooting, please consult the FAQ before asking for help. You are encouraged to contribute to it by adding your solutions!
Arianna Bisazza is an Associate Professor in Computational Linguistics and Natural Language Processing at the Computational Linguistics Group of the University of Groningen. She is passionate about the study of human languages, how they differ from each other, and how they can be modeled by computational tools. Her primary interest is in the development of language technologies supporting a large variety of languages around the world. She is also interested in the new knowledge that computational models can reveal about the nature of language. | |
Gabriele Sarti is a PhD student in the Computational Linguistics Group of the University of Groningen. He is part of the Dutch consortium InDeep, working on interpretability for language generation and neural machine translation. Previously, he was a research scientist at Aindo and a research intern at Amazon Translate NYC. His research interests involve interpretability for NLP, human-AI interaction and the usage of behavioral information like eye-tracking patterns to improve language understanding systems. | |
Jirui Qi is a PhD student in the Computational Linguistics Group of the University of Groningen. He is part of the Dutch consortium LESSEN, and his research mainly focuses on low-resource conversational generation, the generalization of factual knowledge across languages, and prompt-based learning for classification. | |
Leonidas Zotos is a PhD student in the Computational Linguistics Group of the University of Groningen. He works on the intersection between language modelling and human learning with a focus on multifaceted event understanding. The current focus is on multiple choice assessment methods and how these tests can be better designed to improve long term retention. | |
Please open as issue here on Github! This is the second year we are using these contents for the course and although most of them come from battle-tested online tutorials, we are always looking for feedback and suggestions.
We thank our past students Georg Groenendaal, Robin van der Noord, Ayça Avcı and Remco Leijenaar for their contributions in spotting errors in the course materials.
Teaching Assistants Alumni
2023
Ludwig Sickert is was an MSc candidate in AI at the University of Groningen. He attended the IK-NLP course in 2022 and worked on interpreting formality in machine translation systems for his master thesis under the supervision of Gabriele and Arianna. He served as TA for the 2023 edition of the course. |
2022
Anjali Nair was an MSc candidate in AI at the University of Groningen. She served as teaching assistant for the 2022 edition of the course. |