huggingface/pytorch-openai-transformer-lm

In `transform_roc`, why do we need `xmb[:, :, :, 1] `?

FrankWork opened this issue · 3 comments

def transform_roc(X1, X2, X3):
    n_batch = len(X1)
    xmb = np.zeros((n_batch, 2, n_ctx, 2), dtype=np.int32)
    mmb = np.zeros((n_batch, 2, n_ctx), dtype=np.float32)
    start = encoder['_start_']
    delimiter = encoder['_delimiter_']
    for i, (x1, x2, x3), in enumerate(zip(X1, X2, X3)):
        x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
        x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]
        l12 = len(x12)
        l13 = len(x13)
        xmb[i, 0, :l12, 0] = x12
        xmb[i, 1, :l13, 0] = x13
        mmb[i, 0, :l12] = 1
        mmb[i, 1, :l13] = 1
    xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)
    return xmb, mmb

This part of the xmb is used for the learned positional encoding.

An embedding vector is associated by the network to each position of the input and this vector is then added to the corresponding word embedding during the forward pass of the network.

def forward(self, x):
    x = x.view(-1, x.size(-2), x.size(-1))
    e = self.embed(x)
    h = e.sum(dim=2)
    for block in self.h:
        h = block(h)
    return h

The line h = e.sum(dim = 2) do this addition.

@rodgzilla Thank you! You saved my day!

Mark