About this Repository

This repository contains the code to our (Jacob Dudek and Gerrit Bartels) bachelor thesis. We implemented 5 Neural Networks and compared their performance on the task of unconditional text generation. We set up a thorough evaluation scheme using common automatic evaluation methods and supplemented these results with a human survey. Furthermore, we propose two evaluation metrics based on the Jennsen-Shannon Distance that help with judging how well the underlying data distribution has been learnt.

Dataset

As dataset for training all models we used a monolingual news crawl from the Fifth Conference on Machine Translation (WMT20) that can directly be obtained from the conference website. It contains approximately 44M English news sentences extracted from online newspaper articles that were published throughout 2019. After applying our preprocessing steps (see preprocessing notebook) we obtained a dataset comprising approximately 240k sentences with an average length of 18.65 and a vocabulary size of 6801.

Models

LSTMLM

GSGAN

GPT-2 Small

cVAELM

LaTextGAN

Evaluation Methods

To assess the capabilities of our models in unconditional text generation, we employed methods that evaluate the model outputs with respect to sentence diversity (D) and quality (Q), as well as how well the underlying data distribution was captured (C). We also conducted a survey to get an additional angle at judging model performance.

Automatic Measures:

JS Distance Sentence Lengths (C)
JS Distance Token Counts (C)
Test BLEU-4 (Q)
Self BLEU-4 (D)
Fréchet InferSent Distance (Q & D)

Human Evaluation

The survey was implemented in _magpie, a minimal architecture for generating portable interactive experiments and made available as a web-based online survey through the hosting service Netlify. We defined two tasks to elicit judgments about the overall quality of generation (Likert-scale rating) and the participants' likelihood of detecting whether a sentence was artificially generated (2-alternative forced-choice task).

Results

	LSTM LM	cVAE LM	GS GAN	LaText GAN	GPT-2 Small	Real Data
Average Sent Length	*16.83*	12.91	16.89	18.06	17.25	16.66
JS Distance Sent Length	0.1471	0.4334	0.1677	0.3487	*0.1206*	0.0205
JS Distance Token Counts	0.1441	0.2963	*0.1437*	0.5111	0.2444	0.1286
Top 12 Token Overlap	*12/12*	10/12	*12/12*	7/12	11/12	12/12
Test BLEU-4	0.3136	0.0544	0.3258	0.2563	*0.4536*	0.3301
Self BLEU-4	0.3235	*0.0904*	0.3463	0.6746	0.5374	0.3282
FID	0.369	0.9932	*0.3606*	1.9926	0.7368	0.3456

Results of the automatic evaluation methods applied to all models and, for reference, to the test data itself. The best results are highlighted in bold.

	LSTM LM	cVAE LM	GS GAN	LaText GAN	GPT-2 Small	Real Data
Average Fluency Rating	3.0704	1.9296	3.1861	1.704	*3.9025*	4.3948
Confusion Rate in %	23.81	9.93	20.37	9.03	*29.82*	-

Results of the human evaluation applied to all models and, for reference, to the test data itself. The best results are highlighted in bold.

Example Sentences

LSTMLM

elizabeth warren suggests trump would win the u.s through congress, whereas president trump by his th year race as he staged a century.
unable to read live in recent times, china is not long term.
please note that radio had a site of panic and pre recorded surveillance books in the afternoon little body.
and in san diego that may have been trumps remarks after a bitter short tournament.
government employees, women and organisations have been focused on improving care and role to ensure guests be held responsible for their personal data.

GSGAN

should multi billion dollar corporations zero emissions by ?
the mother of a girl next to her was pushed too hard.
labour responded that they should not vote by the snp, then we would need to get brexit done.
but another west london, royal republic, won european international womens semi finals
our future brexit will turn us once again, he said during his three day visit.

GPT-2 Small

when the new government started being introduced in october, there was no such thing as a result that could ever take place.
some lawmakers are going to move forward in the next phase of the senate in a week, as congress does.
she said: it did not feel right and i did not want this to be happening at all.
however, he was left with a six year old who left with the job over £ .

cVAELM

ministers way.
twitter will you fell aside an additional public supply chain of women if
nothing every divided on february on me on, but we for.
it once certainly normally neither this all their scores remain on.
professional annually.

LaTextGAN

thirds he need kong he rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka rt.com lanka
dow is not united do well in europe, but will be an interview.
rt.com rt.com just angeles rt.com feel angeles thrones am angeles have knew the people, in that am not on .
there two do in trump and an emergency and go on an emergency services to %.
$ president a need to and no deal to climate change u.s border on monday.

leeford88/Bachelor-Thesis

About this Repository

Dataset

Models

LSTMLM

GSGAN

GPT-2 Small

cVAELM

LaTextGAN

Evaluation Methods

Automatic Measures:

Human Evaluation

Results

Example Sentences

LSTMLM

GSGAN

GPT-2 Small

cVAELM

LaTextGAN