请问在decode中前向传播的时候,为什么rnn的输入是标签向量和state,而没有经过attention后得到的c(t)呢,没看懂
Opened this issue · 5 comments
您代码里面的解码的前向传播没怎么看懂
`< def forward(self, inputs, init_state, contexts):
if not self.config.global_emb:
embs = self.embedding(inputs)
outputs, state, attns = [], init_state, []
for emb in embs.split(1):
output, state = self.rnn(emb.squeeze(0), state)
output, attn_weights = self.attention(output, contexts)
output = self.dropout(output)
outputs += [output]
attns += [attn_weights]
outputs = torch.stack(outputs)
attns = torch.stack(attns)
return outputs, state
else:
outputs, state, attns = [], init_state, []
embs = self.embedding(inputs).split(1)
max_time_step = len(embs)
emb = embs[0]
output, state = self.rnn(emb.squeeze(0), state)
output, attn_weights = self.attention(output, contexts)
output = self.dropout(output)
soft_score = F.softmax(self.linear(output))
outputs += [output]
attns += [attn_weights]
batch_size = soft_score.size(0)
a, b = self.embedding.weight.size()
for i in range(max_time_step-1):
emb1 = torch.bmm(soft_score.unsqueeze(1), self.embedding.weight.expand((batch_size, a, b)))
emb2 = embs[i+1]
gamma = F.sigmoid(self.gated1(emb1.squeeze())+self.gated2(emb2.squeeze()))
emb = gamma * emb1.squeeze() + (1 - gamma) * emb2.squeeze()
output, state = self.rnn(emb, state)
output, attn_weights = self.attention(output, contexts)
output = self.dropout(output)
soft_score = F.softmax(self.linear(output))
outputs += [output]
attns += [attn_weights]
outputs = torch.stack(outputs)
attns = torch.stack(attns)
return outputs, state>`
The same question!!
我也想问,感觉代码和论文不一样
same question
论文和代码不一致,实现的时候用的是Minh-Thang Luong 15EMNLP的paper的方法Effective Approaches to Attention-based Neural Machine Translation
论文和代码不一致,实现的时候用的是Minh-Thang Luong 15EMNLP的paper的方法Effective Approaches to Attention-based Neural Machine Translation
我看代码里注意力机制那里用的是Luong attention,但是decode这里也和这篇论文有关吗?