soobinseo/Transformer-TTS

About FFN in your code

Opened this issue · 5 comments

hhguo commented
def forward(self, input_):
    # FFN Network
    x = input_.transpose(1, 2) 
    x = self.w_2(t.relu(self.w_1(x))) 
    x = x.transpose(1, 2) 

    # residual connection
    x = x + input_ 

    # dropout
    x = self.dropout(x) 

    # layer normalization
    x = self.layer_norm(x) 

    return x

Should dropout be placed before x + input_ ?

Hi, I will experiment with your advice. Thank you.

I experimented with your advice, but the diagonal alignment did not appear correctly in the attention plot.

hhguo commented

it's weird because the residual dropout should be placed before residual connection in Transformer.

In the paper :
Residual Dropout:
We apply dropout [27] to the output of each sub-layer, before it is added to the
sub-layer input and normalized.

,,,I think it's weird too,,, I should try some more experiments with dropout.

hhguo commented