BERT_Chinese_MRC

基于BERT官方源码做修改，适配中文QA任务DRCD。

Inspired by BERT-for-Chinese-Question-Answering

改动

基于read_squad_example.py，修改中文的tokenization，去除无法匹配answer_start的数据
ToDo

使用

Train&Prediction

python run_drcd.py \
  --vocab_file=$BERT_MODEL_DIR/vocab.txt \
  --bert_config_file=$BERT_MODEL_DIR/bert_config.json \
  --init_checkpoint=$BERT_MODEL_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$DRCD_DIR/DRCD_training.json \
  --do_predict=True \
  --predict_file=$DRCD_DIR/DRCD_test.json \
  --train_batch_size=6 \
  --learning_rate=3e-5 \
  --num_train_epochs=3.0 \
  --do_lower_case=True \
  --max_seq_length=512 \
  --doc_stride=128 \
  --output_dir=$OUTPUT_DIR/

Evaluate

pyton eva.py $DRCD/DRCD_testing.json $OUTPUT_DIR/prediction.json

结果

EM: 85.65702834239909
F1: 91.78050628879733

colinsongf/BERT_Chinese_MRC_drcd

BERT_Chinese_MRC

改动

使用

结果