/CCF_IE

2019百度语言与智能技术竞赛信息抽取赛代5名代码

Primary LanguagePython

Environment:

python 3+ tensorflow 1.10+ keras 2.2.4+

Install

我们将所使用的依赖环境已打包为 requirements.txt keras-bert from here https://github.com/CyberZHG/keras-bert

pip install keras-bert
conda install --yes --file requirements.txt

Datas

请将以下文件放入对应文件夹
1, ./inputs (原始数据存放路径)
应包含以下文件: train_data.json, dev_data.json, all_50_chemas, test_data_postag.json

2, ./bert (存放预训练模型权重路径) 应包含一下文件:
./bert/chinese_L-12_H-768_A-12/bert_config.json;
./bert/chinese_L-12_H-768_A-12/bert_model.ckpt;
./bert/chinese_L-12_H-768_A-12/vocab.txt

以下是预训练权重下载地址
chinese_L-12_H-768_A-12 :https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip

Usage

run python main.py gpu_num1 gpu_num2 gpu_num3 gpu_num4 gpu_num5
The test datasets predictions will be saved into a file called final_data.json in the outputs, and the models be trained will be saved into ensemble_part_x.weights in the models.

We used multiple GPUs for training and prediction, so we also specified multiple GPUS for training and forecasting in this program.

我们在苏神baseline上的工作:
1, BERT
2,优化了标注方式,针对重叠关系的重新设定了多信息的标注方式
3,简化了下游模型结构,尝试了self-attention和普通点乘attention
4,投票方式简单集成
5,规则数据后处理和预处理。

参考文献: Global Normalization of Convolutional Neural Networks for Joint Entity and Relation Classification
One for All Neural Joint Modeling of Entities and Even
Table filling multi-task recurrent neural network for joint entity and relation extraction
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
End-to-End Neural Relation Extraction with Global Optimization

结果

A榜:0.889, B榜:0.8872 , 最终B榜第五(原本第六,第四名放弃)。
ps: A,B榜有差距差距的原因,是因为我不小心把几个用来集成的模型权重给用debug数据跑的模型覆盖了。。。。重新跑全部EPOCH时间又不够,就以重新跑了几个更少EPOCH的模型替代,导致重新预测效果变差,坑了。。。。