thuml/predrnn-pytorch

misconfigured parameter `num_action_ch` for `action_cond_predrnn_v2`

Flunzmas opened this issue · 2 comments

Hi there,

I am having issues using the action-conditional PredRNNV2 for inference.

The way it seems to work (action_injection=concat): Load the actions, grid-repeat them and concat the actual video data and the resulting action tensor channel-wise. Then, use reshape_patch() and pass the input to the model, resulting in a tensor of shape [batch, seq_length, height // patch_size, width // patch_size, (img_ch + action_ch) * patch_size ** 2].

For the action-conditional PredRNNV2 model however, the parameter num_action_ch is used directly for the input channels for the conv layers instead of num_action_ch * patch_size ** 2. For me, this leads to runtime shape mismatches in forward(). Is this an error or did I get it wrong somehow?

Hi,

(1) num_action_ch is equal to the dimension of actions. We expand the action to the size of (height // patch_size, width // patch_size).
(2)the repatch_back is only conducted on the frame. See line135-137 in ./core/models/action_cond_predrnn_v2.py

Thanks for the quick answer!

I see where I thought wrong: For the action-conditional case, the expanded action is concatenated to the frames after reshape_patch() / stripped from the result before reshape_patch_back().
I have looked at the shape returned e.g. in core/data_provider/bair.py and thought that we include the actions in the input to reshape_patch().