IpsumDominum/Pytorch-Simple-Transformer

Saving and loading does not work...

Closed this issue · 3 comments

Setting LOAD=50 does not reproduce the old loss using the checkpoint.pkl

class Encoder(nn.Module):
    def __init__(self,embed_dim,num_heads,num_blocks,CUDA=False):
        super(Encoder, self).__init__()
        self.transformer_blocks = [
            TransformerBlock(embed_dim,num_heads,mask=False,CUDA=CUDA) for _ in range(num_blocks)
        ]
        ### Added this to fix state_dict() hierarchy 
        [setattr(self,'transformer_block_%d'%xi,xxx) for xi,xxx in enumerate(self.transformer_blocks)] 
        self.positional_encoding = PositionalEncoding(embed_dim)
    def forward(self, x):
        # x = self.positional_encoding(x)
        for block in self.transformer_blocks:
            x = block(x,x,x,x)
        return x

This problem is actually critical.... Because assigning a block list as attribute block pytorch from constructing the parameter graph, the blocks actually won't get any gradient flow...

@shouldsee Thanks for pointing that out! You are right, that is critical.

New commit should fix this. I've changed the block lists to nn.ModuleList([]), which should fix the problem as well.

Please let me know of any other issues :)