In equation 14 of E3 paper, the summary vector C is computed based on the extracted spans (summation from s_i to e_i of each span).
However, the implementation here considers all tokens for the summary vector C:
|
inp_attn_score = self.inp_attn_scorer(self.dropout(out['bert_enc'])).squeeze(2) - (1-out['input_mask'].float()).mul(1e20) |
|
inp_attn = F.softmax(inp_attn_score, dim=1).unsqueeze(2).expand_as(out['bert_enc']).mul(self.dropout(out['bert_enc'])).sum(1) |
|
out['clf_scores'] = self.class_clf(self.dropout(inp_attn)) |