/CT-Corpus

Repository of an annotated corpus of clinical trial abstracts supporting schema-based relational information extraction and the code for the inter-annotation agreement calculation and the baseline information extraction method.

Primary LanguagePython

CT-Corpus

This repository contains:

Annotated corpus

The corpus comprises 211 annotated clinical trial abstracts to support schema-based (i.e., template slot-filling) relational information extraction. The schema followed for the annotation relies on the C-TrO ontology for the aggregation of clinical trials. The SANTO tool for schematic annotation was used. The annotated abstracts are of published clinical trials on glaucoma and type 2 diabetes mellitus available at PubMed. The corpus and the annotation guidelines are under the Data directory.

Code

The code for the inter-annotation agreement calculation and the baseline method to recognize single entities. The code can be found under the Code directory.