/MultiTask-T5_AE

Primary LanguagePythonMIT LicenseMIT

PyPI license arXiv

MultiTask-T5_AE 📖

Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media accepted to be published in EMNLP 2021 Findings track explores the sequence-to-sequence transformers to detect and extract Adverse Events (AE) from various source for medical product safety monitoring.
Adverse Events (AE) are harmful events resulting from the use of medical products. Although social media may be crucial for early AE detection, the sheer scale of this data makes it logistically intractable to analyze using human agents, with NLP representing the only low-cost and scalable alternative. In this paper, we frame AE Detection and Extraction as a sequence-to-sequence problem using the T5 model architecture and achieve strong performance improvements over competitive baselines on several English benchmarks (F1 = 0.71, 12.7% relative improvement for AE Detection; Strict F1 = 0.713, 12.4% relative improvement for AE Extraction).

Given an input sequence of words that potentially contains drug, dosage and AE mentions, we frame the AE detection (i.e. binary classification) and extraction (i.e. span detection) tasks as seq-to-seq problems, further finetuning T5 to generate Y, which is either the classification label or the text span with the AE. The example of prefixes used is shown in the figure below:

T5_AE

Table of Contents

Datasets

Installation

Usage

How to Cite

References

Datasets

The datasets used for experimentation is mentioned below:

1. SMM4H

This dataset was introduced for the Shared Tasks on AE in the Workshop on Social Media Mining for Health Applications (SMM4H) (Weissenbacher et al., 2018). The dataset is composed of Twitter posts, typically short, informal texts with non-standard ortography, and it contains annotations for both detection (i.e., Task 1, classification) and extraction (i.e., Task 2, NER) of Adverse Events. The preparation of AE Detection dataset for SMM4H Task 1 requires SMM4H19_Task1.csv file in the /src/data/datasets/SMM4H_Task1/ folder. (column names: tweet_id, tweet, label). Similarly as Task 1 dataset, the importer function for Task 2 expects a file SMM4H19_Task2.csv in the /src/data/datasets/SMM4H_Task2/ folder. (column names: tweet_id,begin,end,type,extraction,drug,tweet,meddra_code,meddra_term
SMM4H Task 2 Dataset Splits: /src/data/splits/SMM4H_Task2

Dataset Total Positive Negative
SMM4H Task 1 (AE Detection)

Train (80%)
Validation (10%)
Test (10%)
15,482

12,386
1,548
1,548
1,339

1,071
134
134
14,143

11,315
1,414
1,414
SMM4H Task 2 (AE Detection, AE Extraction and Drug Extraction)

Train (60%)
Validation (20%)
Test (20%)
2,276

1,365
455
456
1300

780
260
260
976

585
195
196

2. CADEC

CADEC contains 1,250 medical forum posts annotated with patient-reported AEs. In this dataset, texts are long and informal, often deviating from English syntax and punctuation rules. Forum posts may contain more than one AE. For our goals, we adopted the training, validation, and test splits proposed by Dai et al. (2020). The importer for CADEC expects a zip file CADEC.zip in the /src/data/datasets/CADEC/ folder and the dataset is available at: https://data.csiro.au/collections/collection/CIcsiro:10948/SQcadec/RP1/RS25/RORELEVANCE/STsearch-by-keyword/RI1/RT1/ (download the CADEC.v2.zip)
Dataset Splits: /src/data/splits/CADEC

Dataset Total Positive Negative
SMM4H Task 1 (AE Detection, AE Extraction and Drug Extraction)

Train (70%)
Validation (15%)
Test (15%)
1,250

875
187
188
1,105

779
163
163
145

96
24
25

3. ADE Corpus v2

This dataset (Gurulingappa et al., 2012) contains case reports extracted from MEDLINE and it was used for multi-task training, as it contains annotations for all tasks: drugs, dosage, AE detection and extraction. Splits are stratified, to maintain an equal ratio of positive and negative examples.This dataset is automatically prepared by the code by loading the dataset from the huggingface datasets package.

Dataset Total Positive Negative
ADE Corpus v2 (AE Detection)

Train (60%)
Validation (20%)
Test (20%)
23,516

14,109
4,703
4,704
6,821

4,091
1,365
1,365
16,695

10,018
3,338
3,339
ADE Corpus v2 (AE Extraction)

Train (60%)
Validation (20%)
Test (20%)
6,821

4,091
1,365
1,365
6,821

4,091
1,365
1,365
-
ADE Corpus v2 (Drug Extraction)

Train (60%)
Validation (20%)
Test (20%)
7,100

4,260
1,420
1,420
7,100

4,260
1,420
1,420
-
ADE Corpus v2 (Drug Dosage Extraction)

Train (60%)
Validation (20%)
Test (20%)
279

167
56
56
- -

4. WEB-RADR

This dataset is a manually curated benchmark based on tweets. It is used exclusively to test the performance of the multi-task models, as it was originally introduced only for testing purposes (Dietrich et al., 2020). The importer for WEB-RADR expects the file WEB_RADR.csv in the folder /src/data/datasets/WEB_RADR/. (column names: tweet_id, tweet, label, extraction)

Dataset Total Positive Negative
WEB-RADR (AE Detection and AE Extraction)

Test (100%)
57,481 1,056 56,425

5. SMM4H-French

The SMM4H Twitter AE French dataset was introduced in the SMM4H20 (https://www.aclweb.org/anthology/2020.smm4h-1.4.pdf) and the importer expects the file SMM4H_French.csv in the folder /src/data/datasets/SMM4H_French/. (column_names: tweet_id, tweet, label). This dataset is only used for testing the zero-shot transfer learning.

Dataset Total Positive Negative
SMM4H-French (AE Detection)

Test (100%)
1,941 31 1,910

After all the datastes being placed in their respective folders, the following command can be executed to load and prepare all the datasets for the model input.

python src/prep_data,py

Installation

1. Setup Basic virtualenv

python3 -m venv t5_ade
source t5_ade/bin/activate

2. Install Requirements

cd ae-detect
pip install -r requirements.txt

Usage

1. Running Baseline BERT Models:

#Train Models
python train_baseline.py
#Evaluate Models
python eval_baseline.py

More details of the parameters that can be changed are mentioned in the train_baseline.py.

2. Running T5 Models:

# Single Task T5 Model
python t5_train.py

#Multi-Task T5 Model
python t5_multi_task_train.py

#T5 Evaluation
python t5_eval.py

There are couple of options for running the multi-task setting which are described in the script. The T5 model can be trained on Task Balancing (TB) or Task plus Dataset Balancing (TDB) approach for Proportional Mixing (PM) or Temperature Scaling (TS) strategies. For evaluation, the test set and the trained model path information can be changed in t5_eval.py script.

How to Cite

@inproceedings{raval2021exploring,
  title={Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media},
  author={Raval, Shivam and Sedghamiz, Hooman and Santus, Enrico and Alhanai, Tuka and Ghassemi, Mohammad and Chersoni, Emmanuele},
  booktitle={The 2021 Conference on Empirical Methods in Natural Language Processing},
  year={2021},
  organization={Association for Computational Linguistics (ACL)}
}

References

[1] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[2] Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning

[3] Improving Adverse Drug Event Extraction with SpanBERT on Different Text Typologies