/UCL-NLP-Project-PointerGAN

UCL Statistical Natural Language Process Group Project. Text summarization with Seq-2-seq, pointer generator, SeqGAN and PointerGAN.

Primary LanguageJupyter Notebook

Statistical Natural Language Group Project

Department of Computer Science, University College London

Collaborators: Yuze Yang, Ke Xu, Qinghong Zhou, Xinyu Shen


Recent sequence-to-sequence (seq-to-seq) based models lead to successful abstractive text summarization. Expanding from seq-to-seq, pointer-generator with coverage and attention handles out-of-vocabulary in datasets and avoids repetition of words in summaries. Their results through beam seach are captured as generator's input in the generative adversarial network (GAN) developed from SeqGAN. Reinforcement learning methods are used to evaluate the output from the generator. Then a convolutional neural network (CNN) with highway layers functions in the discriminator to update the summerization model.


Download the full dataset here: https://ucsb.app.box.com/s/ap23l8gafpezf4tq3wapr6u8241zz358

Download the sep dataset (separated paragraphs) here: https://ucsb.app.box.com/s/7yq601ijl1lzvlfu4rjdbbxforzd2oag

Original Repo here: https://github.com/mahnazkoupaee/WikiHow-Dataset