PTF-Kommentare

This repository contains code and notes for my Prototype Fund project. It was mainly done between 01.03.2019 and 01.09.2019. The topic: Explaining machine learning and natural language processing at the example of news comments, and visualize language change.

Sub Projects

The work is devided into serveral sub projects:

Website for explaining ML and NLP, as well as investigating language change in online comments: kommentare.vis.one, code
Backend to serve local views on word embeddings (used for kommentare.vis.one): ptf-kommentare-backend
Python package to construct (stable) word embeddings for small data: hyperhyper
Python package to clean text: clean-text
Python package for common text preprocessing for German: german
Python package to lemmatize German text: german-lemmatizer
Benchmark for SVD implementations: sparse-svd-benchmark

Create your own Visualizations of Language Change

Here is a short guide on how to create your own videos. An example video here.

Divide your data in time slices & create a word embedding for each slice
Save the embedding in KeyedVectors format of gensim (using hyperhyper to create stable word embeddings is advised)
Install ffmpeg
pip install git+https://github.com/jfilter/adjustText && pip install gensim scikit-learn matplotlib colormath
Adopt the code in this notebook (so you also need to have either Jupyter Lab or Jupyter Notebook installed.)

Right now, it's not that easy to create those videos. However, it's doable and I'm willing to help you. The 'important' part of the code is commented thoroughly. Please contact me for assistance.

Two papers for a more scientific background:

Some more papers here.

Sponsoring

This work was funded by the German Federal Ministry of Education and Research.

jfilter/ptf-kommentare

PTF-Kommentare

Sub Projects

Create your own Visualizations of Language Change

Sponsoring