This repository contains code for the assignments of an introductory course to Data Science taught at RWTH Aachen University in Winter 22. The working datasets are a mix of synthetic and real-life data. Used technologies include pandas
, sklearn
, matplotlib
, mlextend
, nltk
, gensim
, pm4py
, Docker
, Hadoop MapReduce
, and shell scripts.
For the datasets or reproducibility of the notebooks, please contact one of the contributors.
Contributors:
- Minh-Nghia Phan (minh.nghia.phan@rwth-aachen.de)
- Quang Truong (quang.truong@rwth-aachen.de)
- Khue Hoang (khue.hoang@rwth-aachen.de)
- Van Dao (van.dao@rwth-aachen.de)