misconfigured parameter `num_action_ch` for `action_cond_predrnn_v2`
Flunzmas opened this issue · 2 comments
Hi there,
I am having issues using the action-conditional PredRNNV2 for inference.
The way it seems to work (action_injection=concat
): Load the actions, grid-repeat them and concat the actual video data and the resulting action tensor channel-wise. Then, use reshape_patch()
and pass the input to the model, resulting in a tensor of shape [batch, seq_length, height // patch_size, width // patch_size, (img_ch + action_ch) * patch_size ** 2]
.
For the action-conditional PredRNNV2 model however, the parameter num_action_ch
is used directly for the input channels for the conv layers instead of num_action_ch * patch_size ** 2
. For me, this leads to runtime shape mismatches in forward()
. Is this an error or did I get it wrong somehow?
Hi,
(1) num_action_ch is equal to the dimension of actions. We expand the action to the size of (height // patch_size, width // patch_size).
(2)the repatch_back is only conducted on the frame. See line135-137 in ./core/models/action_cond_predrnn_v2.py
Thanks for the quick answer!
I see where I thought wrong: For the action-conditional case, the expanded action is concatenated to the frames after reshape_patch()
/ stripped from the result before reshape_patch_back()
.
I have looked at the shape returned e.g. in core/data_provider/bair.py
and thought that we include the actions in the input to reshape_patch()
.