MLM 预训练有问题

Question

MLM 预训练有问题

rattlesnakey opened this issue 3 years ago · 6 comments

MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0

Answer 1 · 2022-03-25T13:48:23.000Z

MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0

你指的是pretrain文件夹里的实现嘛。这个问题我确实不知道需要这么处理，你的意思Attention矩阵里Mask token需要和Pad token一样权重置0？
按照我代码的设定，做Self-attention是否对结果精度有负面影响？

Answer 2 · 2022-03-25T13:50:07.000Z

嗯嗯是的，是pretrain 会有影响的对，attention 要和PAD token 一样处理 | | hengyuan_blcu | | ***@***.*** | 签名由网易邮箱大师定制在2022年03月25日 ***@***.***> 写道： MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0 你指的是pretrain文件夹里的实现嘛。这个问题我确实不知道需要这么处理，你的意思Attention矩阵里Mask token需要和Pad token一样权重置0？按照我代码的设定，做Self-attention是否对结果精度有负面影响？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Answer 3 · 2022-03-25T13:51:10.000Z

嗯嗯是的，是pretrain 会有影响的对，attention 要和PAD token 一样处理 | | hengyuan_blcu | | @.*** | 签名由网易邮箱大师定制在2022年03月25日 @.> 写道： MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0 你指的是pretrain文件夹里的实现嘛。这个问题我确实不知道需要这么处理，你的意思Attention矩阵里Mask token需要和Pad token一样权重置0？按照我代码的设定，做Self-attention是否对结果精度有负面影响？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.>

我有空的时候看看咋改吧 or 你可以提一个PR hhhh

Answer 4 · 2022-03-25T13:52:43.000Z

嗯嗯好的没问题，你的这个挺好的，我以后也打算把我做过的项目代码整合一下，向您学习～ | | hengyuan_blcu | | ***@***.*** | 签名由网易邮箱大师定制在2022年03月25日 ***@***.***> 写道：嗯嗯是的，是pretrain 会有影响的对，attention 要和PAD token 一样处理 | | hengyuan_blcu | | @.*** | 签名由网易邮箱大师定制在2022年03月25日 @.> 写道： MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0 你指的是pretrain文件夹里的实现嘛。这个问题我确实不知道需要这么处理，你的意思Attention矩阵里Mask token需要和Pad token一样权重置0？按照我代码的设定，做Self-attention是否对结果精度有负面影响？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我有空的时候看看咋改吧 or 你可以提一个PR hhhh — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Answer 5 · 2022-03-25T13:56:03.000Z

嗯嗯好的没问题，你的这个挺好的，我以后也打算把我做过的项目代码整合一下，向您学习～ | | hengyuan_blcu | | @.*** | 签名由网易邮箱大师定制在2022年03月25日 @.> 写道：嗯嗯是的，是pretrain 会有影响的对，attention 要和PAD token 一样处理 | | hengyuan_blcu | | @. | 签名由网易邮箱大师定制在2022年03月25日 @.> 写道： MASK token 预测的时候，它是不会和其他MASK token 做self-attention的，所以其他的mask token 的attention_mask 要为0 你指的是pretrain文件夹里的实现嘛。这个问题我确实不知道需要这么处理，你的意思Attention矩阵里Mask token需要和Pad token一样权重置0？按照我代码的设定，做Self-attention是否对结果精度有负面影响？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我有空的时候看看咋改吧 or 你可以提一个PR hhhh — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

互相学习 ^ ^

Answer 6 · 2022-03-25T15:19:50.000Z

您好，我暂时的修复了这个问题，相关更新可以见代码以及README的更新记录，感谢🙏你提出的问题，后续会补上相关的测试（顺便求个Star哈哈哈哈哈😂），没啥问题我就先关闭issue了