A biomedical event extraction system by a novel Combination Strategy based on hybrid deep neural networks.
Online demo : http://www.predictor.xin/event_extraction/
Zhu L, Zheng H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks[J]. BMC bioinformatics, 2020, 21(1): 47. https://rdcu.be/b1Pn1
(0. The train corpus contains .txt/.a1/.a2 files, placed in /txt
/a1
/a2
directories respectively.)
- Run
corpus_table.py
, convert .txt/.a1/.a2 files into uniform .csv files (saved in/table
). - Run
a2_preprocess.py
, convert .a2 files into structured .csv files (saved in/a2_table
). - Run
a2_preprocess_modification.py
, convert event modifications in .a2 files into structured .csv files (saved in/a2_modification_table
). - The test corpus should be generated by repeating above process and saved in others directories, e.g.
/table_test
,/a2_table_test
,/a2_modification_table_test
.
- Run
my_word2vec.py
to generate the pre-trained word vectors (better to modify the code to use a large external corpus). - Run
pretrain_char_cnn.py
to generate the pre-trained character-level CNN. - Run
train_net.py
to train the main neural networks, in formatpython train_net.py NUMBER
with a argumentNUMBER
which number the trained networks, the trained networks are saved in/nets
.
- Run
test_generate_strategy_best_ensemble.py
to extract events from testset in/table_test
, in formatpython test_generate_strategy_best_ensemble.py NUM1 NUM2
whereNUM1
denotes the number of ensemble epoches andNUM2
denotes the number of ensemble training. Meanwhile, the base number forNUM1
andNUM2
should be set in codes. The result .a2 files is saved in/a2_result
. (1. Thetest_generate_event_eval_ensemble.py
is a variation method with same usage.) - Use
python BioNLP-evaluation-CG.py -r /GOLDEN_STANDARD_DIR /PARENT_DIR/a2_result/*.a2
to evaluate the performance (for CG).
- Pytorch
- Gensim
- NLTK
- Pandas
- sklearn
TO BE CONTINUED ..