bert-event-extraction

Pytorch Solution of Event Extraction Task using BERT on ACE 2005 corpus

Prerequisites

Prepare ACE 2005 dataset.
Use nlpcl-lab/ace2005-preprocessing to preprocess ACE 2005 dataset in the same format as the data/sample.json. Then place it in the data directory as follows:
```
├── data
│     └── test.json
│     └── dev.json
│     └── train.json
│...
```
这是data的分布格式。
change into ere dataset
```
 ├── data
 │     └── test.json
 │     └── dev.json
 │     └── train.json
 │...
```
tokens：[] list形式，将sentence拆分为一维的list[] "tokens": ["Con", "respecto", "a", "la", "pregunta", "que", "se", "deben", "estar", "haciendo", "..."]
setence与tokens可以对应 "sentence": "Con respecto a la pregunta que se deben estar haciendo..."
entity_mentions 实体提及，是文本中指代实体（enetity）的词实体：先列出来BIOES分别代表什么意思：

B，即Begin，表示开始

I，即Intermediate，表示中间

E，即End，表示结尾

S，即Single，表示单个字符

O，即Other，表示其他，用于标记无关字符其中，PER代表人名， LOC代表位置， ORG代表组织. B-PER、I-PER代表人名首字、人名非首字， B-LOC、I-LOC代表地名(位置)首字、地名(位置)非首字，B-ORG、I-ORG代表组织机构名首字、组织机构名非首字，O代表该字不属于命名实体的一部分 [{"id": "c93832992e8ca0020c806137834bdd38-0-42-303", "start": 6, "end": 7, "entity_type": "PER", "mention_type": "PRO", "text": "se"}] 与ace对比： "golden-entity-mentions": [ { "text": "we", "entity-type": "ORG:Media", "head": { "text": "we", "start": 2, "end": 3 },

Install the packages.

pip install pytorch==1.0 pytorch_pretrained_bert==0.6.1 numpy

Usage

Train

python train.py

Evaluation

python eval.py --model_path=latest_model.pt

Result

Performance

Method	Trigger Classification (%)			Argument Classification (%)
Method	Precision	Recall	F1	Precision	Recall	F1
JRNN	66.0	73.0	69.3	54.2	56.7	55.5
JMEE	76.3	71.3	73.7	66.8	54.9	60.3
This model (BERT base)	63.4	71.1	67.7	48.5	34.1	40.0

The performance of this model is low in argument classification even though pretrained BERT model was used. The model is currently being updated to improve the performance.

Reference

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation (EMNLP 2018), Liu et al. [paper]
lx865712528's EMNLP2018-JMEE repository [github]
Kyubyong's bert_ner repository [github]

train.py

train(model, train_iter, optimizer, criterion)

heng840/event_extraction