PharmaCoNER using BERT in the Machine Reading Comprehension Framework

Directory

Spanish-MRC
|
├─ dataset-cased
|   ├─ train.txt
|   ├─ dev.txt
|   ├─ test.txt
|   ├─ query_tarin.txt
|   ├─ query_dev.txt
|   └─ query_test.txt
|
├─ dataset-uncased
|   ├─ train.txt
|   ├─ dev.txt
|   ├─ test.txt
|   ├─ query_tarin.txt
|   ├─ query_dev.txt
|   └─ query_test.txt
| 
├─ Spanish-Query
|   ├─ Q1
|   ├─ Q2
|   └─ Q3
|  
├─ models
|   ├─ PubMedBERT-Base-uncased(abstracts + full text)
|   ├─ PubMedBERT-Base-uncased(abstracts only)
|   ├─ BlueBERT-Base-uncased(PubMed)
|   ├─ ClinicalBERT
|   ├─ PubMedBERT-Base-uncased(abstracts only)
|   └─ BioBERT-Base v1.2 (+ PubMed 1M)
| 
├─ processors
|   ├─ bert_ent.py
|   └─ utils_ner.py
| 
├─ metrics
|   └─ ner_metrics.py
| 
├─ output
| 
├─ run.ipynb
| 
└─ README.md

dataset-cased and datset-uncased: provide the processed upper and lower case PharmaCoNER dataset.
Spanish-Query: provide three kinds of Spanish queries.
models: provide the model required for the experiment.
processor: provides read and basic pre-processing procedures for data sets.
metrics: defined the evaluation function of the model.
outputs: used to store the output files.
run.ipynb: modify the dataset and model paths and run to get the corresponding experimental results.

Experimental Environment

Ubuntu 18.04
python 3.9.13
transformers 4.25.1
torch 1.13.0
numpy 1.21.5
pandas 1.4.4
scipy 1.9.1

shed-e/Spanish-MRC

PharmaCoNER using BERT in the Machine Reading Comprehension Framework

Directory

Experimental Environment