ServerSideHannes/las

Regarding the class att_rnn

Opened this issue · 0 comments

Hi. This is Yong Joon Lee. I am implementing LAS model based on your code. I know you might not remember the actual code cuz obviously you implemented it 3 years ago. But I think I found out that class att_rnn might have a tiny mistake in code ordering. If you see the class att_rnn's call part. you define s twice in a row then move onto c, which is a attention context.

your ordering is as below:

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}] #m is memory(hidden) and c is carry(cell)
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}
c       = self.attention_context([s[0], h])

but isn't it supposed to be as below?

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}]
c       = self.attention_context([s[0], h]) 
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}

As the original paper suggests, attention context vector at timestep t is made by applying attention to the s_t and h, where h is a result of pBLSTM. But I think by your way of ordering you are deriving attention context vector from s_{t+1} and h. Thank you for your great work.