A simple script for extracting plain text from arxiv dataset: https://www.kaggle.com/Cornell-University/arxiv
pdfminer.six==20201018, p_tqdm==1.2
{
"id": "2010.01447",
"title": "GraphDialog: Integrating Graph Knowledge into End-to-End Task-Oriented Dialogue Systems",
"abstract": "End-to-end task-oriented dialogue systems aim to generate system responses... ",
"introduction": "Task-oriented dialogue systems aim to help user accomplish specific tasks via natural language interfaces ...",
"related work": "Task-oriented dialogue system has been a longstanding studied topic...",
"proposed model": "Our proposed model consists of three components: an encoder...",
"experiments": "4.1 Dataset To validate the efficacy of our proposed model...",
"acknowledgements": "We would like to thank...",
"all_contents": "GraphDialog: Integrating Graph Knowledge..."
}