/DuReader

Baseline Systems of DuReader Dataset

Primary LanguagePython

DuReader

DuReader focus on the benchmarks and models of machine reading comprehension for question answering.

Dataset:

DuReader 2.0: A new large-scale real-world and human sourced MRC dataset [Paper] [Code] [Leaderboard]

DuReader Robust: A dataset challenging models in (1)over-sensitivity, (2)over-stability and (3)generalization. [Paper] [Code] [Learderboard]

DuReader Yes/No: A dataset challenging models in opinion polarity judgment. [Code] [Leaderboard]

DuReader Checklist: A dataset challenging model understanding capabilities in vocabulary, phrase, semantic role, reasoning. [Code] [Leaderboard]

DuQM: Linguistically Perturbed Natural Questions for Evaluating theRobustness of Question Matching Models.[Paper][Code] [Leaderboard]

DuReader Robust, DuReader Yes/No, DuReader Checklist, DuQMcan be downloaded at qianyan official website. DuReader 2.0 can be downloaded by following the method in DuReader-2.0/README.md at this repository.

Models:

KT-NET: A machine reading comprehension (MRC) model which integrates knowledge from knowledge bases (KBs) into pre-trained contextualized representations. [Paper] [Code] [Learderboard]

D-NET: A simple pre-training and fine-tuning framework which focused on the generalization of machine reading comprehension (MRC) models. [Paper] [Code] [Learderboard]

News

  • September 2021, we released DuQM that is a Chinese dataset of linguistically perturbed natural questions for evaluating the robustness of question matching models, and it was included in qianyan.
  • June 2021, DuReader Robust, DuReader Yes/No and DuReader Checklist were included in qianyan.
  • May 2021, DuReader Robust (short paper) was accepted by ACL 2021.
  • March 2021, DuReader Checklist was released, holding the DuReader Checklist challenge.
  • March 2020, DuReader Robust was released, holding the DuReader Robust challenge.
  • December 2019, DuReader Yes/No was released, holding the DuReader Yes/No challenge. After that, DuReader Yes/No Individual Challenge and Team Challenge were held.
  • August 2019, D-NET was released and ranked at top 1 of the MRQA-2019 shared task.
  • July 2019, KT-NET was accepted by ACL 2019.
  • March 2019, the second MRC challenge was held based on DuReader 2.0, including hard samples in the test set.
  • April 2018, DuReader 2.0 was accepted by ACL 2018 at the Workshop on Machine Reading for Question Answering.
  • March 2018, the first MRC challenge was held based on DuReader 2.0

Detailed Description

DuReader contains four datasets: DuReader 2.0, DuReader Robust, DuReader Yes/No and DuReader Checklist. The main features of these datasets include:

  • Real question, Real article, Real answer, Real application scenario;
  • Rich question types, including entity, number, opinion, etc;
  • Various task types, including span-based tasks and classification tasks;
  • Rich task challenges, including model retrieval capability, model robustness, model checklist etc.

DuReader 2.0 : Real question, Real article, Real answer

[Paper] [Code] [Leaderboard]

DuReader is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows: Real question, Real article, Real answer, Real application scenario and Rich annotation.

KT-NET: Integrate knowledge into pre-trained LMs.

[Paper] [Code] [Learderboard]

KT-NET (Knowledge and Text fusion NET) is a machine reading comprehension (MRC) model which integrates knowledge from knowledge bases (KBs) into pre-trained contextualized representations. The model is proposed in ACL2019 paper Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension.

D-NET: Model generalization

[Paper] [Code] [Learderboard]

D-NET is a simple system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models. The system is built on a framework of pretraining and fine-tuning. The techniques of pre-trained language models and multi-task learning are explored to improve the generalization of MRC models. D-NET is ranked at top 1 of all the participants in terms of averaged F1 score.

DuReader Robust: Model Robustness

[Paper] [Code] [Learderboard]

DuReader Robust is designed to challenge MRC models from the following aspects: (1) over-sensitivity, (2) over-stability and (3) generalization. Besides, DuReader Robust has another advantage over previous datasets: questions and documents are from Baidu Search. It presents the robustness issues of MRC models when applying them to real-world scenarios.

DuReader Yes/No: Opinion Yes/No Questions

[Code] [Leaderboard]

Span-based MRC tasks adopt F1 and EM metrics to measure the difference between predicted answers and labeled answers. However, the task about opinion polarity cannot be well measured by these metrics. DuReader Yes/No is proposed to challenge MRC models in opinion polarity, which will complement the disadvantages of existing MRC tasks and evaluate the effectiveness of existing models more reasonably.

DuReader Checklist: Natural Language Understanding Capabilities

[Code] [Leaderboard]

DuReader Checklist is a high-quality Chinese machine reading comprehension dataset for real application scenarios. It is designed to challenge the natural language understanding capabilities from multi-aspect via systematic evaluation (i.e. checklist), including understanding of vocabulary, phrase, semantic role, reasoning and so on.

DuQM: Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

[Paper][Code] [Leaderboard]

DuQM is a Chinese question matching robust dataset, which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM is designed to be fine-grained, diverse and natural. And it contains 3 categories and 13 subcategories with 32 linguistic perturbations.

Dataset and Evaluation Tools

We make public a dataset loading and evaluation tool named qianyan. You can use this package easily by following the qianyan repo.

Copyright and License

Copyright 2017 Baidu.com, Inc. All Rights Reserved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact Information

For help or issues using DuReader, including datasets and baselines, please submit a Github issue.

For other communication or cooperation, please contact Jing Liu (liujing46@baidu.com) or Hongyu Li (lihongyu04@baidu.com).