a new way of backprob of C[Xb] instead of for loop

Question

a new way of backprob of C[Xb] instead of for loop

Opened this issue 6 months ago · 1 comments

@karpathy
when I'm watching your zero_to_hero serial in youtube (and I think it's awesome)
I come up with an new idea of backprob of C[Xb], (I have reply in youtube as well)

the original method like this :
dC = torch.zeros_like(C)
for k in range(Xb.shape[0]):
for j in range(Xb.shape[1]):
ix = Xb[k,j]
dC[ix] += demb[k,j]

my method like this
dC = (F.one_hot(Xb).float().transpose(1, 2) @ demb).sum(0)

and I check that the grad is matched

that woks because we can convert the index format to an one_hot with matrix multiple @, then we can just use the backprob rule as the matrix multiple

Answer 1 · 2024-07-19T03:20:59.000Z

+1, it can be written as:

# scatter the gradient via OHE
Xb_ohe = F.one_hot(Xb, num_classes=vocab_size).float()
dC = torch.einsum('ble,blv->ve', demb, Xb_ohe)