Saving and loading does not work...
Closed this issue · 3 comments
shouldsee commented
Setting LOAD=50
does not reproduce the old loss using the checkpoint.pkl
shouldsee commented
class Encoder(nn.Module):
def __init__(self,embed_dim,num_heads,num_blocks,CUDA=False):
super(Encoder, self).__init__()
self.transformer_blocks = [
TransformerBlock(embed_dim,num_heads,mask=False,CUDA=CUDA) for _ in range(num_blocks)
]
### Added this to fix state_dict() hierarchy
[setattr(self,'transformer_block_%d'%xi,xxx) for xi,xxx in enumerate(self.transformer_blocks)]
self.positional_encoding = PositionalEncoding(embed_dim)
def forward(self, x):
# x = self.positional_encoding(x)
for block in self.transformer_blocks:
x = block(x,x,x,x)
return x
shouldsee commented
This problem is actually critical.... Because assigning a block list as attribute block pytorch from constructing the parameter graph, the blocks actually won't get any gradient flow...
IpsumDominum commented
@shouldsee Thanks for pointing that out! You are right, that is critical.
New commit should fix this. I've changed the block lists to nn.ModuleList([]), which should fix the problem as well.
Please let me know of any other issues :)