why use randomly embedding of image feature can train the language model

Question

why use randomly embedding of image feature can train the language model

vanpersie32 opened this issue 8 years ago · 0 comments

I notice the layer after vgg network.It is a linear layer which perform embedding of image feature.
https://github.com/karpathy/neuraltalk2/blob/master/misc/net_utils.lua#L38

In training stage, it is not trained or finetuned and that means the output of cnn is random,.
https://github.com/karpathy/neuraltalk2/blob/master/train.lua#L39

so what is input to language model is random. I think it cannot be trained with good result,but actually after 100000 iteration,the model can have cider of 0.8,very weird!!!!!!!!!