Would share more details for using BERT?

Question

Would share more details for using BERT?

ybdesire opened this issue 4 years ago · 3 comments

Would you share more detailed steps for using BERT as word embedding ?

It seems bert has not been suppoted ?
https://github.com/allanj/pytorch_lstmcrf/blob/master/trainer.py#L56

Answer 1 · 2020-07-15T13:47:01.000Z

Since I'm actually working/debugging on another version (in another branch) that uses BERT (from Huggingface) as a direct encoder which allows us to fine-tune BERT, I did not provide more details on using BERT as static embedding.

The quick answer I will give is to use the BERT As Service repo: https://github.com/hanxiao/bert-as-service

You can extract the representations for your sentence and save them using pickle.
One thing you should keep in mind is the BERT-As-Service package gives you workpiece token representation instead of the word representation.

You have three options:

Use the first workpiece representation of a word to represent the word
Use the last workpiece representation of a word to represent the word
Use the average workpiece representation of a word to represent the word

Since you have such a requirement, I will try to provide the script later on as well.

Answer 2 · 2020-07-15T15:19:23.000Z

Please check out the latest commit 9afe2ed .
You will be able to process bert representation now.

Script can be found in preprocess/get_bert_vec.py

I use BERT-As-Service for now.
Alternatively, you can just use the one provided by HuggingFace.

Answer 3 · 2020-07-15T15:24:22.000Z

@allanj wow, thank you very much for the quickly help. I will try it latter. : )