About Two-stage

Question

About Two-stage

Closed this issue 4 years ago · 4 comments

I'm sorry to bother you again.

I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '?
What's the difference between the stage-1-pretain and using 'train_pretrain. py'?

What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)

Could you describe more details? Thanks very much.

Answer 1 · 2021-04-04T06:17:10.000Z

Generaly speaking, stage one is a Chengyu-oritented pretraining using a large corpus. This process is done through train_pretrain.py. But this stage may cost huge computation power, so we released the https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext model over huggingface.

We have details about models for each abalation study in Section 4.2.

w/o Pre-Training is running stage 2 directly from hfl/chinese-bert-wwm-ext
w/o Fine-Tuning is zero-shot evaluation using model of stage 1
w/o 𝐿V and w/o 𝐿A are whether we use original, enlarged or combined loss function in finetuning, which can be found in the code.

Answer 2 · 2021-04-04T09:51:56.000Z

Wow , OK.
I wanna know the small candidate set is 3848 or 7 (one question candidate) which one ?.

and ....
the two vocabulary is the same ?

Answer 3 · 2021-04-04T10:10:38.000Z

Vocabulary is the same 32k idioms, although during finetuning the model will only change the first 3848 entries of the vocabulary.

Options is used in Stage 2, which is the candidate sets of seven options.

Answer 4 · 2021-04-10T01:06:49.000Z

Oh, Thx. Most problems have been solved for the time being. Thank you very much for your kind answer.