/qa-dsss2023

(Meta)data Quality materials for Data Science Summer School 20023, Göttingen

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

(Meta)data Quality materials for Data Science Summer School 2023, Göttingen

This repository contains materials for the (Meta)data Quality session, part of the Göttingen Data Science Summer School, 2023. The first part functions as an introduction to the topic, in the second part students will learn how to work with real data. The repository contains data and code for this part, organising into 4 tasks:

  1. finding outlier in CSV
  2. counting elements in XML
  3. introduction to SHACL
  4. introduction to JSON Schema

The code are written in Python and the utilized tools are Python based. Instructions to setup virtual environments are available in the slides.