About FFN in your code

Question

About FFN in your code

Opened this issue 5 years ago · 5 comments

def forward(self, input_):
    # FFN Network
    x = input_.transpose(1, 2) 
    x = self.w_2(t.relu(self.w_1(x))) 
    x = x.transpose(1, 2) 

    # residual connection
    x = x + input_ 

    # dropout
    x = self.dropout(x) 

    # layer normalization
    x = self.layer_norm(x) 

    return x

Should dropout be placed before x + input_ ?

Answer 1 · 2019-07-10T03:54:53.000Z

Hi, I will experiment with your advice. Thank you.

Answer 2 · 2019-07-11T02:05:57.000Z

I experimented with your advice, but the diagonal alignment did not appear correctly in the attention plot.

Answer 3 · 2019-07-11T02:43:11.000Z

it's weird because the residual dropout should be placed before residual connection in Transformer.

Answer 4 · 2019-07-11T07:40:30.000Z

In the paper :
Residual Dropout:
We apply dropout [27] to the output of each sub-layer, before it is added to the
sub-layer input and normalized.

,,,I think it's weird too,,, I should try some more experiments with dropout.

Answer 5 · 2019-07-11T09:23:29.000Z

Yes. At least it shows that dropout is very important in Transformer-TTS. But it should be used in a different way. Thanks for your sharing.

…

On Thu, Jul 11, 2019 at 3:40 PM soobin seo ***@***.***> wrote: In the paper : Residual Dropout: We apply dropout [27] to the output of each sub-layer, *before* it is added to the sub-layer input and normalized. ,,,I think it's weird too,,, I should try some more experiments with dropout. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AGW7YHWCWTRVNLBUTTLQITTP63PW7A5CNFSM4H7K4SR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVZ25Y#issuecomment-510369143>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGW7YHVKIZJ37CYCJDIHROTP63PW7ANCNFSM4H7K4SRQ> .