Datasets

For years of technical accumulation, our research group has developed several datasets for natural language processing, and we are willing to share them. Following are download links for the datasets and the related papers. Welcome to download:

Summarization Datasets

  1. Dataset for Multimodal Summarization with Multimodal Output [paper] [GoogleDrive]
  2. Dataset for Multimodal Sentence Summarization [paper][GoogleDrive]
  3. Dataset for Multimodal Summarization [paper][OneDrive]
  4. Dataset for Multimodal Summarization [paper][Github]
  5. Datasets for Customer Service Dialogue Summarization [paper][Github]

Sentiment Analysis Datasets

  1. Dataset for Personalized Review Summarization [paper][OneDrive]
  2. Dataset for Document-level Multi-aspect Sentiment Classification [paper][OneDrive]
  3. Dataset for Document-level Sentiment Classification [paper][OneDrive]

Representation Learning Datasets

  1. Dataset for Chinese Sentence Representation [paper][Github]
  2. Dataset for Phrase Representation [paper][Github]

Others

  1. Dataset for Speech Translation [paper][Baidu Link, Passwd: bva0]