About Two-stage
Closed this issue · 4 comments
I'm sorry to bother you again.
I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '?
What's the difference between the stage-1-pretain and using 'train_pretrain. py'?
What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)
Could you describe more details? Thanks very much.
Generaly speaking, stage one is a Chengyu-oritented pretraining using a large corpus. This process is done through train_pretrain.py
. But this stage may cost huge computation power, so we released the https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext model over huggingface.
We have details about models for each abalation study in Section 4.2.
w/o Pre-Training
is running stage 2 directly fromhfl/chinese-bert-wwm-ext
w/o Fine-Tuning
is zero-shot evaluation using model of stage 1w/o 𝐿V
andw/o 𝐿A
are whether we useoriginal
,enlarged
orcombined
loss function in finetuning, which can be found in the code.
Vocabulary is the same 32k idioms, although during finetuning the model will only change the first 3848 entries of the vocabulary.
Options is used in Stage 2, which is the candidate sets of seven options.
Oh, Thx. Most problems have been solved for the time being. Thank you very much for your kind answer.