MultiTask-T5_AE 📖

Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media accepted to be published in EMNLP 2021 Findings track explores the sequence-to-sequence transformers to detect and extract Adverse Events (AE) from various source for medical product safety monitoring.
Adverse Events (AE) are harmful events resulting from the use of medical products. Although social media may be crucial for early AE detection, the sheer scale of this data makes it logistically intractable to analyze using human agents, with NLP representing the only low-cost and scalable alternative. In this paper, we frame AE Detection and Extraction as a sequence-to-sequence problem using the T5 model architecture and achieve strong performance improvements over competitive baselines on several English benchmarks (F1 = 0.71, 12.7% relative improvement for AE Detection; Strict F1 = 0.713, 12.4% relative improvement for AE Extraction).

Given an input sequence of words that potentially contains drug, dosage and AE mentions, we frame the AE detection (i.e. binary classification) and extraction (i.e. span detection) tasks as seq-to-seq problems, further finetuning T5 to generate Y, which is either the classification label or the text span with the AE. The example of prefixes used is shown in the figure below:

Datasets

The datasets used for experimentation is mentioned below:

1. SMM4H

This dataset was introduced for the Shared Tasks on AE in the Workshop on Social Media Mining for Health Applications (SMM4H) (Weissenbacher et al., 2018). The dataset is composed of Twitter posts, typically short, informal texts with non-standard ortography, and it contains annotations for both detection (i.e., Task 1, classification) and extraction (i.e., Task 2, NER) of Adverse Events. The preparation of AE Detection dataset for SMM4H Task 1 requires SMM4H19_Task1.csv file in the /src/data/datasets/SMM4H_Task1/ folder. (column names: tweet_id, tweet, label). Similarly as Task 1 dataset, the importer function for Task 2 expects a file SMM4H19_Task2.csv in the /src/data/datasets/SMM4H_Task2/ folder. (column names: tweet_id,begin,end,type,extraction,drug,tweet,meddra_code,meddra_term
SMM4H Task 2 Dataset Splits: /src/data/splits/SMM4H_Task2

Dataset	Total	Positive	Negative
SMM4H Task 1 (AE Detection) Train (80%) Validation (10%) Test (10%)	15,482 12,386 1,548 1,548	1,339 1,071 134 134	14,143 11,315 1,414 1,414
SMM4H Task 2 (AE Detection, AE Extraction and Drug Extraction) Train (60%) Validation (20%) Test (20%)	2,276 1,365 455 456	1300 780 260 260	976 585 195 196

2. CADEC

CADEC contains 1,250 medical forum posts annotated with patient-reported AEs. In this dataset, texts are long and informal, often deviating from English syntax and punctuation rules. Forum posts may contain more than one AE. For our goals, we adopted the training, validation, and test splits proposed by Dai et al. (2020). The importer for CADEC expects a zip file CADEC.zip in the /src/data/datasets/CADEC/ folder and the dataset is available at: https://data.csiro.au/collections/collection/CIcsiro:10948/SQcadec/RP1/RS25/RORELEVANCE/STsearch-by-keyword/RI1/RT1/ (download the CADEC.v2.zip)
Dataset Splits: /src/data/splits/CADEC

Dataset	Total	Positive	Negative
SMM4H Task 1 (AE Detection, AE Extraction and Drug Extraction) Train (70%) Validation (15%) Test (15%)	1,250 875 187 188	1,105 779 163 163	145 96 24 25

3. ADE Corpus v2

This dataset (Gurulingappa et al., 2012) contains case reports extracted from MEDLINE and it was used for multi-task training, as it contains annotations for all tasks: drugs, dosage, AE detection and extraction. Splits are stratified, to maintain an equal ratio of positive and negative examples.This dataset is automatically prepared by the code by loading the dataset from the huggingface datasets package.

Dataset	Total	Positive	Negative
ADE Corpus v2 (AE Detection) Train (60%) Validation (20%) Test (20%)	23,516 14,109 4,703 4,704	6,821 4,091 1,365 1,365	16,695 10,018 3,338 3,339
ADE Corpus v2 (AE Extraction) Train (60%) Validation (20%) Test (20%)	6,821 4,091 1,365 1,365	6,821 4,091 1,365 1,365	-
ADE Corpus v2 (Drug Extraction) Train (60%) Validation (20%) Test (20%)	7,100 4,260 1,420 1,420	7,100 4,260 1,420 1,420	-
ADE Corpus v2 (Drug Dosage Extraction) Train (60%) Validation (20%) Test (20%)	279 167 56 56	-	-

4. WEB-RADR

This dataset is a manually curated benchmark based on tweets. It is used exclusively to test the performance of the multi-task models, as it was originally introduced only for testing purposes (Dietrich et al., 2020). The importer for WEB-RADR expects the file WEB_RADR.csv in the folder /src/data/datasets/WEB_RADR/. (column names: tweet_id, tweet, label, extraction)

Dataset	Total	Positive	Negative
WEB-RADR (AE Detection and AE Extraction) Test (100%)	57,481	1,056	56,425

5. SMM4H-French

The SMM4H Twitter AE French dataset was introduced in the SMM4H20 (https://www.aclweb.org/anthology/2020.smm4h-1.4.pdf) and the importer expects the file SMM4H_French.csv in the folder /src/data/datasets/SMM4H_French/. (column_names: tweet_id, tweet, label). This dataset is only used for testing the zero-shot transfer learning.

Dataset	Total	Positive	Negative
SMM4H-French (AE Detection) Test (100%)	1,941	31	1,910

After all the datastes being placed in their respective folders, the following command can be executed to load and prepare all the datasets for the model input.

python src/prep_data,py

Installation

1. Setup Basic virtualenv

python3 -m venv t5_ade
source t5_ade/bin/activate

2. Install Requirements

cd ae-detect
pip install -r requirements.txt

Usage

1. Running Baseline BERT Models:

#Train Models
python train_baseline.py
#Evaluate Models
python eval_baseline.py

More details of the parameters that can be changed are mentioned in the train_baseline.py.

2. Running T5 Models:

# Single Task T5 Model
python t5_train.py

#Multi-Task T5 Model
python t5_multi_task_train.py

#T5 Evaluation
python t5_eval.py

There are couple of options for running the multi-task setting which are described in the script. The T5 model can be trained on Task Balancing (TB) or Task plus Dataset Balancing (TDB) approach for Proportional Mixing (PM) or Temperature Scaling (TS) strategies. For evaluation, the test set and the trained model path information can be changed in t5_eval.py script.

How to Cite

@inproceedings{raval2021exploring,
  title={Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media},
  author={Raval, Shivam and Sedghamiz, Hooman and Santus, Enrico and Alhanai, Tuka and Ghassemi, Mohammad and Chersoni, Emmanuele},
  booktitle={The 2021 Conference on Empirical Methods in Natural Language Processing},
  year={2021},
  organization={Association for Computational Linguistics (ACL)}
}

References

[1] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[2] Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning

[3] Improving Adverse Drug Event Extraction with SpanBERT on Different Text Typologies

shivamraval98/MultiTask-T5_AE

MultiTask-T5_AE 📖

Table of Contents

Datasets

1. SMM4H

2. CADEC

3. ADE Corpus v2

4. WEB-RADR

5. SMM4H-French

Installation

1. Setup Basic virtualenv

2. Install Requirements

Usage

1. Running Baseline BERT Models:

2. Running T5 Models:

How to Cite

References