the input of the hallucinator?
seamanj opened this issue · 5 comments
Dear author
I am a little bit confused on hallucinator.
In paper (Figure 2.), it says "We also train a hallucinator h that takes a single image feature phi_t and learns to hallucinate its temporal representation". However, in the source code, the input is the whole sequence image features. is it contradict? Thanks in advance.
`
def fc2_res(phi, name='fc2_res'):
"""
Converts pretrained (fixed) resnet features phi into movie strip.
This applies 2 fc then add it to the orig as residuals.
Args:
phi (B x T x 2048): Image feature.
name (str): Scope.
Returns:
Phi (B x T x 2048): Hallucinated movie strip.
"""
`
We run the entire sequence through the hallucinator in a batch to improve efficiency/code readability. The output is equivalent to passing in a 1x1x2048 vector B*T times.
Dear Jason
Much appreciated for your reply first!
Assuming I have one single image feature 1x1x2048, after it goes through the hallucinator, I was wondering what the dimension of the Phi (output) would be. is it 1x1x2048, or 1x3x2048?
I mean in order to predict the past and future pose from the Phi, should the Phi contain the past and future info as well? In that way, the dimension of Phi should be 1x3x2048. So the hallucinator is basically a fully connected network from 1x1x2048 to 1x3x2048?
Am I right?
Thanks again!
The hallucinated movie strip \tilde{\Phi} will have dimension 1x1x2048. Refer to Figure 2 in the text. Movie strips (and hallucinated movie strips) are latent representations that should already capture the temporal context, and thus the same representation is passed to the past, present, and future regressors to read out the corresponding poses.
So basically, for a specific frame t, {Phi}_t will be 1x1x2048 as well? I mean, {Phi}_t and \tilde{\Phi} should be in the same dimension in order to calculate the loss function in formula (3).
Thanks for your help!
That is correct. My pleasure.