This repository contains the source code for a Dutch speculation detector for clinical text developed in the scope of the
ACCUMULATE project. The speculation detection is performed specifically for detected clinical concepts within a sentence, rather than on the token level.
- Python 3
- Frog
This module processes raw clinical text using Frog and integrates the preprocessed output with user-provided concept annotations on the raw text. Gold standard negation annotations can be included for later evaluation.
from preprocessing import PreprocessCorpus
preprocessor = PreprocessCorpus()
preprocessed_instances = preprocessor(file_ids)
# file_ids = list of paths to .json files containing one dictionary each with the relevant input data
# example input dictionary:
# input_dictionary['text'] = raw clinical text to be processed by Frog
# input_dictionary['concept_spans'] = [{'begin': start_idx, 'end': end_index},
{'begin': start_idx, 'end': end_index}]
# if gold standard annotations are present for negation:
# input_dictionary['speculation_status'] = [True, False]
from speculation_tagger import SpeculationTagger
# if gold standard is included, gold_included should be True, else False
tagger = SpeculationTagger(gold_included)
tagged_sentences = tagger(preprocessed_instances)
from speculation_detector import SpeculationDetector, SpeculationDetectorEvaluation
# choose model from ['forward', 'backward', 'forward_punct', 'backward_punct', 'finetuned_baseline', 'finetuned_hybrid']
sentence_instances = tagged_sentences['sentence_instances']
# usage for data WITHOUT gold standard speculation annotations
detector = SpeculationDetector()
instances_detection_data = detector.detect(sentence_instances, model)
# usage for data WITH gold standard speculation annotations
detector = SpeculationDetectorEvaluation()
results = detector(sentence_instances, model)
Matches the first following concept after a detected speculation cue.
Matches the first preceding concept before a detected speculation cue.
Matches all following concepts before the first following punctuation.
Matches all preceding concepts after the first preceding punctuation.
Applies for each cue separately the most effective of the four baseline models.
Replaces the fine-tuned baseline model for every cue it can outperform with a rule selected from simple rules on the Frog dependency parse.