VisualJoyce/ChengyuBERT

About the huggingface pretrained model

Closed this issue · 5 comments

Hi, how can I use this huggingface pretrained model to produce chengyu embeddings? https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext ,
since chinese-BERT-wwm only produces token based embedding.
1635147883(1)

Vimos commented

This is a contextualized embedding. You may use [MASK] to replace your idiom, and use the hidden of the masked token together with the hidden of [CLS]. There is a separate idiom embedding, you can further concat this. See Figure below.

image

since pretrain may cost huge computation power, trying to directly use huggingface pretrained model to produce chengyu contextualized embeddings. for example "赵括的纸上谈兵使得赵国在长平之战中大败" is inputed to this pretrained model, only get separate embedding for '纸‘or‘上’or‘谈’or‘兵’, cannot obtain separate embedding for '纸上谈兵’.
image

Vimos commented

In this case, you need to change the input from "赵括的纸上谈兵使得赵国在长平之战中大败" to "赵括的[MASK]使得赵国在长平之战中大败"

It works, thanks a lot!

Vimos commented

Glad to hear that!