复现Roberta-Large和ELECTRA-Large的问题
yiyaxiaozhi opened this issue · 1 comments
yiyaxiaozhi commented
我使用的环境是
pytorch 1.4.0
transformers 2.8.0
参照着文档https://github.com/thunlp/OpenMatch/blob/master/docs/experiments-msmarco.md 中的训练命令
CUDA_VISIBLE_DEVICES=0 \
python train.py\
-task ranking \
-model bert \
-train ./data/train.jsonl \
-max_input 3000000 \
-save ./checkpoints/electra_large.bin \
-dev queries=./data/queries.dev.small.tsv,docs=./data/collection.tsv,qrels=./data/qrels.dev.small.tsv,trec=./data/run.msmarco-passage.dev.small.100.trec \
-qrels ./data/qrels.dev.small.tsv \
-vocab google/electra-large-discriminator \
-pretrain google/electra-large-discriminator \
-res ./results/electra_large.trec \
-metric mrr_cut_10 \
-max_query_len 32 \
-max_doc_len 256 \
-epoch 1 \
-batch_size 2 \
-lr 5e-6 \
-eval_every 10000
训练到global step 约18w local step约72w的时候,训练的验证MRR就会从0.33一直往下降,且整个训练过程结束后,最高的MRR只到了0.336.是什么地方遗漏了会导致这样的问题呢?
Yu-Shi commented
您试试加大batch size可不可以呢?您可以采用多卡训练或者gradient accumulation