About FFN in your code
Opened this issue · 5 comments
hhguo commented
def forward(self, input_):
# FFN Network
x = input_.transpose(1, 2)
x = self.w_2(t.relu(self.w_1(x)))
x = x.transpose(1, 2)
# residual connection
x = x + input_
# dropout
x = self.dropout(x)
# layer normalization
x = self.layer_norm(x)
return x
Should dropout be placed before x + input_ ?
soobinseo commented
Hi, I will experiment with your advice. Thank you.
soobinseo commented
I experimented with your advice, but the diagonal alignment did not appear correctly in the attention plot.
hhguo commented
it's weird because the residual dropout should be placed before residual connection in Transformer.
soobinseo commented
In the paper :
Residual Dropout:
We apply dropout [27] to the output of each sub-layer, before it is added to the
sub-layer input and normalized.
,,,I think it's weird too,,, I should try some more experiments with dropout.
hhguo commented
Yes.
At least it shows that dropout is very important in Transformer-TTS. But it
should be used in a different way. Thanks for your sharing.
…On Thu, Jul 11, 2019 at 3:40 PM soobin seo ***@***.***> wrote:
In the paper :
Residual Dropout:
We apply dropout [27] to the output of each sub-layer, *before* it is
added to the
sub-layer input and normalized.
,,,I think it's weird too,,, I should try some more experiments with
dropout.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11?email_source=notifications&email_token=AGW7YHWCWTRVNLBUTTLQITTP63PW7A5CNFSM4H7K4SR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVZ25Y#issuecomment-510369143>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGW7YHVKIZJ37CYCJDIHROTP63PW7ANCNFSM4H7K4SRQ>
.