VisualJoyce/ChengyuBERT

About Two-stage

Closed this issue · 4 comments

I'm sorry to bother you again.

I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '?
What's the difference between the stage-1-pretain and using 'train_pretrain. py'?

What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)

Could you describe more details? Thanks very much.

Generaly speaking, stage one is a Chengyu-oritented pretraining using a large corpus. This process is done through train_pretrain.py. But this stage may cost huge computation power, so we released the https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext model over huggingface.

We have details about models for each abalation study in Section 4.2.

  • w/o Pre-Training is running stage 2 directly from hfl/chinese-bert-wwm-ext
  • w/o Fine-Tuning is zero-shot evaluation using model of stage 1
  • w/o 𝐿V and w/o 𝐿A are whether we use original, enlarged or combined loss function in finetuning, which can be found in the code.

Wow , OK.
I wanna know the small candidate set is 3848 or 7 (one question candidate) which one ?.
image

and ....
the two vocabulary is the same ?

微信图片_20210404175121

Vocabulary is the same 32k idioms, although during finetuning the model will only change the first 3848 entries of the vocabulary.

Options is used in Stage 2, which is the candidate sets of seven options.

Oh, Thx. Most problems have been solved for the time being. Thank you very much for your kind answer.