Created by Sarah Oberbichler ORCID

NLP Course for Digital Humanities and Cultural Studies

Welcome to the repository of the NLP course for Digital Methods in the Humanities

About the Course

This course offers an introduction to Natural Language Processing (NLP) and its application in digital humanities. The course is part of the Master's program "Digital Methods for Humanities and Cultural Studies (DMGK)" in Mainz.

Course Website: https://ieg-dhr.github.io/NLP-Course4Humanities_2024/

Course Contents

The course covers the following topics:

  • Introduction to NLP, Jupyter Notebooks, and Python
  • Using SpaCy for NLP tasks
  • Introduction: German Newspaper Portal and its API
  • Recent advances in NLP: Transformer models for semantic search and text similarity (Word Embeddings)
  • Recent advances in NLP: Large Language Models (LLMs) for Semantic Text Extraction (Article Segmentation) and Post-OCR Correction
  • Named Entity Recognition (NER) and Text Classification

Repository Structure

  • index.html: Main page of the course
  • styles.css: CSS stylesheet for the course website
  • materials/: Folder for course materials and resources

License