Recent sequence-to-sequence (seq-to-seq) based models lead to successful abstractive text summarization. Expanding from seq-to-seq, pointer-generator with coverage and attention handles out-of-vocabulary in datasets and avoids repetition of words in summaries. Their results through beam seach are captured as generator's input in the generative adversarial network (GAN) developed from SeqGAN. Reinforcement learning methods are used to evaluate the output from the generator. Then a convolutional neural network (CNN) with highway layers functions in the discriminator to update the summerization model.
Download the full dataset here: https://ucsb.app.box.com/s/ap23l8gafpezf4tq3wapr6u8241zz358
Download the sep dataset (separated paragraphs) here: https://ucsb.app.box.com/s/7yq601ijl1lzvlfu4rjdbbxforzd2oag
Original Repo here: https://github.com/mahnazkoupaee/WikiHow-Dataset