/document2slides

This repository contains the code to reconstruct the training dataset from NLP/ML Papers in PDF format together with their corresponding slides.

Primary LanguagePythonApache License 2.0Apache-2.0

document2slides

This code is for NAACL 2021 paper D2S: Document-to-Slide Generation Via Query-Based Text Summarization

This repository contains:

  1. sciduet-build: code to reconstruct the training dataset from NLP/ML Papers in PDF format together with their corresponding slides
  2. SciDuet-ACL: finished preprocess ACL training data
  3. Derivability annotations together with the trained classifier
  4. d2s-model: code to train and evaluate automatic slide generation system

Edward Sun, Yufang Hou, Dakuo Wang, Yunfeng Zhang, Nancy X.R. Wang. D2S: Document-to-Slide Generation Via Query-Based Text Summarization. In Proceedings of the 18th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021), Online, 6 - 11 June 2021