Negation and Speculation Corpora in NLP

The negation and speculation annotated corpora for multiple languages and various domains are listed in this table.

Ref. Year Corpus Language Domain Size Neg. Spec. Avail.
1 2007 BioInfer English Biomedical 1,100
2 2008 GENIA English Biomedical 9,372 Link
3 2008 BioScope English Biomedical 20,924 Link
4 2010 CoNLL-2010 English Biological, Wikipedia 40,289 Link
5 2010 Product Review English Review 2,111
6 2010 Stockholm EPR Swedish Clinical 6,740
7 2011 PropBank FOC English Journal stories 3,779
8 2012 SFU Review English Review 17,263 Link
9 2012 ConanDoyle-neg English Short stories 4,423 Link
10 2014 hUnCertainty Hungarian Misc. 15,203
11 2014 Review and Newspaper Japanese Review,Newspaper 2,147 Link
12 2014 EMC Dutch Clinical 12,888 medical terms
13 2015 Twitter Negation English Tweets 4,000
14 2015 CNeSp Chinese Literature, Reviews, Financial articles 16,841
15 2016 DT-Neg English Dialogues 27,785 responses Link
16 2016 EMR Chinese Biomedical 36,828
17 2016 GNSC German Biomedical 2,234
18 2016 BioArabic Arabic Biomedical 10,165
19 2017 IULA Spanish Biomedical 3,194 Link
20 2017 UHU-HUVR Spanish Clinical 8,412
21 2017 SFU ReviewSP NEG Spanish Review 9,455 Link
22 2017 News (Fact-Ita Bank) and Tweets Italian News stories, Tweets 1,591
23 2018 SFU SOCC English Opinion 1,043 comments Link
24 2018 NegPar English-Chinese Short stories 5520 E 5005 C Link
25 2019 ESSAI French Medical 6,547
26 2019 CAS French Medical 3,811
27 2020 REBEC Brazilian Portuguese Clinical 3,228
28 2020 Clinical narratives Brazilian Portuguese Clinical 9,808
29 2020 NUBES Spanish Biomedical 29,682 Link
30 2020 NewsComm Spanish Comments 4,980 Link
31 2021 T-MexNeg Mexican Spanish Tweets 13,704 Link
32 2021 ArNeg Arabic Wikipedia, Biography, Religion 6,000


Mahany, A.; Khaled, H.; Elmitwally, N.S.; Aljohani, N.; Ghoniemy, S. Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications. Appl. Sci. 2022, 12, 5209.