A Deep Generative Framework for Paraphrase Generation

Model:

This is the implementation of A Deep Generative Framework for Paraphrase Generation by Ankush et al. (AAA2018) with Kim's Character-Aware Neural Language Models embedding for tokens. The code used the Samuel Bowman's Generating Sentences from a Continuous Space implementation as a base code available here.

Usage

Before model training it is necessary to train word embeddings for both questions and its paraphrases:

$ python train_word_embeddings.py --num-iterations 1200000
$ python train_word_embeddings_2.py --num-iterations 1200000

This script train word embeddings defined in Mikolov et al. Distributed Representations of Words and Phrases

Parameters:

--use-cuda

--num-iterations

--batch-size

--num-sample –– number of sampled from noise tokens

To train model use:

$ python train.py --num-iterations 140000

Parameters:

--use-cuda

--num-iterations

--batch-size

--learning-rate

--dropout –– probability of units to be zeroed in decoder input

--use-trained –– use trained before model

To sample data after training use:

$ python test.py

Parameters:

--use-cuda

--num-sample