a new way of backprob of C[Xb] instead of for loop
Opened this issue · 1 comments
haduoken commented
@karpathy
when I'm watching your zero_to_hero serial in youtube (and I think it's awesome)
I come up with an new idea of backprob of C[Xb], (I have reply in youtube as well)
the original method like this :
dC = torch.zeros_like(C)
for k in range(Xb.shape[0]):
for j in range(Xb.shape[1]):
ix = Xb[k,j]
dC[ix] += demb[k,j]
my method like this
dC = (F.one_hot(Xb).float().transpose(1, 2) @ demb).sum(0)
and I check that the grad is matched
that woks because we can convert the index format to an one_hot with matrix multiple @, then we can just use the backprob rule as the matrix multiple
afrozenator commented
+1, it can be written as:
# scatter the gradient via OHE
Xb_ohe = F.one_hot(Xb, num_classes=vocab_size).float()
dC = torch.einsum('ble,blv->ve', demb, Xb_ohe)