VikParuchuri/zero_to_gpt

hidden state index at t=1 during backprop

mattiasospetti opened this issue · 0 comments

During packpropagation, at t=2, when he calculates the gradient of the h, he writes this:

h_weight_grad += hiddens[1,:][:,np.newaxis] @ h2_grad

Taking into account the hidden state at t=1. But later, when he backprops at t=1, he writes:

h_weight_grad += hiddens[1,:][:,np.newaxis] @ h1_grad

It's not clear to me why he uses the t=1 hidden state again: shouldn't it be hiddens[0,:] since that gradient is sensitive to the previous hidden state?

Thanks in advance to whomever may help me.