Curated list of public datasets which focus on sentence classification in academic papers or abstracts
Name | Year | Domains | Source | Annotated by | #Papers | Text Type | Classes |
---|---|---|---|---|---|---|---|
CODA-19 | 2020 | Biomedical Sciences | CORD-19 | Crowdworkers | 10,966 | abstracts | (4+1) BACKGROUND, PURPOSE, METHOD, FINDING/CONTRIBUTION, OTHER |
cs.combined | 2020 | Computer Science (cs.NI + cs.TLT + cs.TPAMI ) |
arXiv + IEEE Transactions | Experts | 450 | abstracts | (3) BACKGROUND, TECHNIQUE, OBSERVATION |
CSABSTRUCT | 2019 | Computer Science | Semantic Scholar corpus | Crowdworkers | 2,189 | abstracts | (4+1) BACKGROUND, OBJECTIVE, METHOD, RESULT, OTHER |
CS Abstracts | 2019 | Computer Science | arXiv | Crowdworkers | 654 | abstracts | (5) BACKGROUND, OBJECTIVE, METHODS, RESULTS, CONCLUSIONS |
PubMed PICO Element Detection Dataset | 2018 | Biomedical Sciences | PubMed | Author | 24,668 | abstracts | (7) AIM, PARTICIPANTS, INTERVENTION, OUTCOME, METHOD, RESULTS, CONCLUSION |
PubMed 200k RCT | 2017 | Biomedical Sciences | PubMed | Author | 200,000 | abstracts | (5) BACKGROUND, OBJECTIVE, METHOD, RESULT, CONCLUSION |
PubMed 20k RCT | 2017 | Biomedical Sciences | PubMed | Author | 20,000 | abstracts | (5) BACKGROUND, OBJECTIVE, METHOD, RESULT, CONCLUSION |
MCCRA (Multi-CoreSC CRA corpus) | 2016 | Cancer Risk Assessment (CRA) | selected by a domain expert | Experts | 50 | full paper | (11) HYPOTHESIS, MOTIVATION, BACKGROUND, GOAL, OBJECT, METHOD, EXPERIMENT, MODEL, OBSERVATION, RESULT, CONCLUSION |
DRI Corpus (Dr. Inventor Multi-Layer Scientific Corpus) | 2015 | Computer Graphics | a bigger collection provided by experts in the domain | Experts | 40 | full paper | (5) BACKGROUND, CHALLENGE, APPROACH, OUTCOME, FUTURE WORK |
NICTA-PIBOSO | 2011 | Biomedical Sciences | PubMed | Experts | 1,000 | abstracts | (5+1) BACKGROUND, POPULATION, INTERVENTION, OUTCOME, STUDY DESIGN, OTHER |
ART Corpus (CoreSC) | 2010 | Physical Chemistry and Biochemistry | Royal Society of Chemistry (RSC) Publishing | Experts | 225 | full paper | (11) HYPOTHESIS, MOTIVATION, BACKGROUND, GOAL, OBJECT, METHOD, EXPERIMENT, MODEL, OBSERVATION, RESULT, CONCLUSION |
AZ Corpus | 2002 | Computational Linguistics | arXiv | Experts & Author | 80 | full paper | (6+1) AIM, TEXTUAL, OWN, BACKGROUND, CONTRAST, BASIS, OTHER |