BrikerMan/Kashgari

[Question] 分类采用bert embedding output_feature

shiwl0329 opened this issue · 4 comments

You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写,将会忽略并关闭这个 issue

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

Environment

  • OS [e.g. Mac OS, Linux]:
  • Python Version:
  • requirements.txt:
[Paste requirements.txt file here]

Question

请教一下:bert的run_classify.py input的向量是[batch_size, hidden_size]
image
而kashgari的bilstm model的input向量想[batch_size, seq_len, hidden_size * 4]
image
有两个问题:1、为什么跟bert的run_classify差距比较大 2、为啥取的是最后4层? 3、分类不应该直接取最后一层的cls向量去作为input向量么
[A clear and concise description of what you want to know.]

根据 Bert 论文 倒数第二段最后一句话,我们选择了最后四个层的输出合并起来作为后续层的输入。

For the feature-based approach, we concatenate the last 4 layers of BERT as the features, which was shown to be the best approach in Section 5.3.

哦哦,明白了。不过为啥bert用来分类的只取了最后一层的CSL向量接一个全连接,跟论文上说的还不一样

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.