请问E_num + 1 为什么老是+1？

Question

请问E_num + 1 为什么老是+1？

Closed this issue 3 years ago · 9 comments

xhw205 commented 3 years ago

multi-selection中E_num+1 代表的是什么呢？是去重后的实体类型+2个间隔符吗？

self.linear_start = nn.Linear(config.hidden_size, E_num + 1)
self.linear_end = nn.Linear(config.hidden_size, E_num + 1)

Answer 1 · 2021-11-17T10:10:15.000Z

不要意思哈，这个问题要你自己理解下算法了

…

------------------ 原始邮件 ------------------ 发件人: "zhoujx4/NLP-Series-relation-extraction" ***@***.***>; 发送时间: 2021年11月17日(星期三) 下午5:25 ***@***.***>; ***@***.***>; 主题: Re: [zhoujx4/NLP-Series-relation-extraction] 请问E_num + 1 为什么老是+1？ (Issue #4) 还有就是 subject_type_ids 代表什么意思？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Answer 2 · 2021-11-18T01:35:11.000Z

好的，我阅读了《BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction》这个论文。

"+1"代表的是[CLS]
subject_type_ids 是为了做soft label embed
请问我理解的对吗？

Answer 3 · 2021-11-18T02:09:49.000Z

好的，我阅读了《BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction》这个论文。

"+1"代表的是[CLS]

subject_type_ids 是为了做soft label embed
请问我理解的对吗？

+1这个E_num，是实体标签的数量，跟【CLS】没关系的啊
subject_type_ids，我没记错的话，是把这个token对应的实体信息作为输入的一部分，用于接下来的关系抽取

Answer 4 · 2021-11-18T02:33:07.000Z

因为还有I和O标签，我猜应该是，我自己都忘了hh

…

------------------ 原始邮件 ------------------ 发件人: "zhoujx4/NLP-Series-relation-extraction" ***@***.***>; 发送时间: 2021年11月18日(星期四) 上午10:32 ***@***.***>; ***@***.******@***.***>; 主题: Re: [zhoujx4/NLP-Series-relation-extraction] 请问E_num + 1 为什么老是+1？ (Issue #4) 关于第一点，你的 E_num = len(args.s2id) 本就代表实体标签数量了，为什么无缘无故再加上1？第2点一致，subje_type_ids 代表的确实是 subject 和 object 各自的尾下标打上实体label去做soft label embed，相当于结合了实体标签信息 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Answer 5 · 2021-11-18T02:33:17.000Z

关于第一点，你的 E_num = len(args.s2id) 本就代表实体标签数量了，为什么无缘无故再加上1？
第2点一致，subje_type_ids 代表的确实是 subject 和 object 各自的尾下标打上实体label去做soft label embed，相当于结合了实体标签信息，因为默认的SPO联合抽取是不关注 SO的实体类型信息的

Answer 6 · 2021-11-18T02:35:28.000Z

好的谢谢

Answer 7 · 2021-11-24T06:41:48.000Z

the same , why entities num plus 1 ?

Answer 8 · 2021-11-24T10:15:17.000Z

got it, the plus 1 is for PAD

Answer 9 · 2021-11-25T00:49:19.000Z

s2id = {'Date': 1, 'Number': 2, 'Text': 3, '人物': 4, '企业': 5, '企业/品牌': 6, '作品': 7, '历史人物': 8, '国家': 9, '图书作品': 10, '地点': 11, '城市': 12, '奖项': 13, '娱乐人物': 14, '学校': 15, '学科专业': 16, '影视作品': 17, '文学作品': 18, '景点': 19, '机构': 20, '歌曲': 21, '气候': 22, '电视综艺': 23, '行政区': 24, '语言': 25, '音乐专辑': 26}
下标是从1开始的，subject_label大小便为[Length, 2, len(s2id) + 1]