/artificial-self-AMLD-2020

Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"

Primary LanguageJupyter NotebookMIT LicenseMIT

Meet your Artificial Self: Generate text that sounds like you

This repository contains all resources for the Applied Machine Learning Days workshop Meet your Artificial Self: Generate text that sounds like you.

In this workshop, participants are tasked to download their own chat logs and build a chat bot that generates text similar to their writing. As an alternative to using chat logs, we provide a number of other conversational (and non-conversational datasets) datasets in this repository.

Gitter

Feel free to join our Gitter during the workshop:

Gitter

Slides

Find the workshop slides here.

Usage

The workshop is split in 3 tasks. You can run each task locally (by cloning this repository) or by running the Colab notebook (see links below). If you run locally, make sure you have access to GPU(s) and you are running Python 3.6+ (also make sure you have sufficient storage space). More detailed instructions are provided in the different subfolders.

Task 1

Fine-tune GPT-2 on various datasets (including tweets, poetry, programming code, chess, music and more!). Thanks to @manueth for compiling the datasets!

➡️ Read more

Task 2

We use the same approach of style transfer to train a conversational model from our chat logs. You can either use Chatistics to parse your own chat logs or you can use some of the provided resources. Thanks to @MasterScrat for compiling the conversational datasets!

➡️ Read more

Task 3

We extend the approach in task 2 by introducing multi-task learning, improving data preprocessing, and adding token types.

➡️ Read more

Credits