About the huggingface pretrained model

Question

About the huggingface pretrained model

Closed this issue 3 years ago · 5 comments

Hi, how can I use this huggingface pretrained model to produce chengyu embeddings? https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext ,
since chinese-BERT-wwm only produces token based embedding.

Answer 1 · 2021-10-28T02:24:33.000Z

This is a contextualized embedding. You may use [MASK] to replace your idiom, and use the hidden of the masked token together with the hidden of [CLS]. There is a separate idiom embedding, you can further concat this. See Figure below.

Answer 2 · 2021-10-28T08:33:57.000Z

since pretrain may cost huge computation power, trying to directly use huggingface pretrained model to produce chengyu contextualized embeddings. for example "赵括的纸上谈兵使得赵国在长平之战中大败" is inputed to this pretrained model, only get separate embedding for '纸‘or‘上’or‘谈’or‘兵’, cannot obtain separate embedding for '纸上谈兵’.

Answer 3 · 2021-10-28T08:36:53.000Z

In this case, you need to change the input from "赵括的纸上谈兵使得赵国在长平之战中大败" to "赵括的[MASK]使得赵国在长平之战中大败"

Answer 4 · 2021-10-28T09:49:04.000Z

It works, thanks a lot!

Answer 5 · 2021-10-28T11:31:35.000Z

Glad to hear that!