关于输出每个word的embedding

Question

关于输出每个word的embedding

Opened this issue 5 years ago · 1 comments

作者您好，请问使用您的代码应该如何改写能输出最终fine-tune的模型中的word embedding呢？
另外我注意到您没有进行分词，是一个中文字为单位，如果想使用字的embedding得到一个词汇的embedding，请问有什么比较好的方式么？谢谢！

Answer 1 · 2019-12-01T13:28:13.000Z

@ECNU109

您好，个人认为最简单的方法是将单词用[CLS]和[SEP]包装起来(例如，如[CLS]头痛[SEP])传入预训练的BERT，然后将[CLS]的最后一层embedding作为词汇的embedding。

此外，可以参考bert-as-service的计算方式：

Q: How do you get the fixed representation? Did you do pooling or something?
A: Yes, pooling is required to get a fixed representation of a sentence. In the default strategy REDUCE_MEAN, I take the second-to-last hidden layer of all of the tokens in the sentence and do average pooling.

这里的sentence也可以是词汇。