= Generator Of One-liners From Examples with Ratings
This is a framework for generating humour from examples. It was created for my master's thesis "Automatic Joke Generation: Learning Humour from Examples".
An implementation of a system relying on this framework is also provided, namely GAG. GAG, Generalised Analogy Generator, generates "I like my X like I like my Y, Z" jokes from rated examples. The training data set was collected with our platform, JokeJudger.com. The implementation of JokeJudger as well as the collected data are also made available.
If you want to reference this work, you can use this BibTex file.
@inproceedings{winters2018automaticjokegeneration,
issn = {0302-9743},
journal = {Distributed, Ambient and Pervasive Interactions: Technologies and Contexts},
pages = {360--377},
volume = {10922 LNCS},
publisher = {Springer International Publishing},
isbn = {9783319911304},
year = {2018},
title = {Automatic joke generation: Learning humor from examples},
language = {eng},
author = {Winters, Thomas and Nys, Vincent and De Schreye, Danny},
keywords = {Computational humor},
organization = {Streitz, Norbert}
}
-
Setting up Java environment: In order for GAG to work, Java 8 SE and JDK needs to be installed. We also recommend using IntellIJ for opening the code.
-
Download required repositories: Aside from our Google Ngram to MySQL converter tool, this framework is also dependent on our text-util repository, our generator-util repository and our DatamuseAPI Java library. They should all be cloned and put in a folder next to the
goofer
folder. -
Setting up required Google N-gram databases: In order for GAG to work, Google Ngrams needs to be present in a MySQL database. More specifically, both English One Million 1-gram and 2-gram needs to be loaded in using our Java Google Ngram to MySQL tool in a database following database design specified in the repository. Loading this database will take several hours.
We recommend the following steps:
- Create a database called
ngram
using a MySQL server such as WAMP. - Forward engineering the
database-model.mwb
-file to this database, e.g. using MySQL workbench. - Load the database using our google-ngrams-to-mysql tool using
the following arguments (don't forget to add arguments to link to your database if this is different from a localhost database called
ngram
):
For 1-grams:
-folder [FOLDER_OF_UNZIPPED_NGRAM_CSVS] -filePrefix googlebooks-eng-1M-1gram-20090715- -n 1 -allowedRegex lowercase -endIndex 10
For 2-grams:
-folder [FOLDER_OF_UNZIPPED_NGRAM_CSVS] -filePrefix googlebooks-eng-1M-2gram-20090715- -n 2 -allowedRegex lowercase -endIndex 100 -constrainer adjectivenoun
-
Install Gradle: You also need Gradle to download all dependencies from
build.gradle
. This is built-in in IntellIJ and thus should work out of the box when using that environment. -
Running GAG system: The GAG system can be executed by using Java to run the main method of the
GeneralisedAnalogyGenerator.java
class. It supports following arguments:
Argument | Description |
---|---|
-outputModel | Path where the program should output the training model file |
-output | Path where the program should output the training model file |
-maxSimilarity | If given, GAG will only output generations if it differs enough (no more words similar than this value) from previous generations |
-outputWords | Allow the template values in the training model file (classifiers have diffulty dealing with strings though) |
-inputJokes | Path to the input jokes file |
-sortRating | Whether or not the output should be sorted by their rating |
-minScore | Minimal score threshold to be considered a good joke |
-sqlHost | Host of the SQL database of the n-grams database |
-sqlPost | Port of the SQL database of the n-grams database |
-sqlUser | Username of the SQL database of the n-grams database |
-sqlPassword | Password of the SQL database of the n-grams database |
-sqlDB | Database name of the SQL database of the n-grams database |
-dictionary | Path to the WordNet dictionary |
-posFile | Path to the Stanford POS tagger |
-classifier | The classifier to use to learn from the input jokes |
-aggregator | The rating aggregator to combine the ratings with |
-x | First template value of an analogy joke |
-y | Second template value of an analogy joke |
-z | Third template value of an analogy joke |
-generator, -g | Type of template values generator: sql, datamuse or twogram |