/QHGCorpus

Dataset for CIKM 2018 paper "Question Headline Generation for News Articles"

QHGCorpus

Dataset for CIKM 2018 long paper "Question Headline Generation for News Articles"

Description

The dataset can be downloaded at link (unpacks to 602M). Each line in QHG_corpus.txt consists of a pair of (news article, question headline). Each line is tab-delimited (two tabs) with the following format:

<news articles>\t\t<question headline>

File

The dataset is collected from ByteDance (a popular news portal in China).

We release the original news collections with 34,3696 news articles, and the dataset used in the CIKM  
2018 paper can be built by using the corresponding pre-process steps. 

Citation

@inproceedings{Zhang:CIKM2018,
author = {Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Huanhuan Cao, and Xueqi Cheng},
title = {{Question Headline Generation for News Articles}},
booktitle = {CIKM},
year = 2018
}