Kaixhin/PlaNet

allow other embedding-size besides 1024

gyom opened this issue · 5 comments

gyom commented

We're using this implementation for a research project, and we've seen problems when trying to use other values of embedding-size. Can you give us some hints about what could be causing this situation?

pybullet build time: Jun 20 2019 15:31:37
Traceback (most recent call last):
  File "/current_project/robo-planet/torch_planet/main.py", line 233, in <module>
    beliefs, prior_states, prior_means, prior_std_devs, posterior_states, posterior_means, posterior_std_devs = transition_model(init_state, actions[:-1], init_belief, bottle(encoder, (observations[1:], )), nonterminals[:-1])
  File "/current_project/robo-planet/torch_planet/models.py", line 10, in bottle
    y = f(*map(lambda x: x[0].view(x[1][0] * x[1][1], *x[1][2:]), zip(x_tuple, x_sizes)))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: 
shape '[-1, 4096]' is invalid for input of size 2508800:
operation failed in interpreter:

            return torch.transpose(self, dim0, dim1), backward

        def view(self,
                 size: List[int]):
            self_size = self.size()
            def backward(grad_output):
                return grad_output.reshape(self_size), None

            return torch.view(self, size), backward
                   ~~~~~~~~~~ <--- HERE

Hey Guillaume, glad to hear you're using it :) I've addressed this problem in the commit above by adding a linear projection layer if the embedding size isn't 1024, which makes the code more flexible, but the underlying problem is that 1024 comes directly from the convolutional architecture of the encoder. So if you're planning to control this more finely I'd just hack in whatever architectures work best for your problem.

Debugging tip: The JIT makes errors harder to see, so replace jit.ScriptModule with nn.Module and delete the @jit.script_method annotations if you're having problems.

@Kaixhin we were curious because 1024 features seemed like a lot to us. Have you tried smaller values?

Nope I haven't played around with this (agreed that it could probably be smaller). The encoder and decoder architectures seem to have been taken directly from World Models.

The encoder and decoder architectures seem to have been taken directly from World Models.

That is what they claimed in the article. David Ha was also on both papers.