train on synthetic dataset
nanlliu opened this issue · 2 comments
Image to text: R@1 is bad and it fluatuates around (1.5)
Text to image: R@1 is bad and it fluatuates around (0.5)
my vocabulary size is small (~100 words).
do you think training my own 'resnet' image/text encoder on synthetic images would help?
Much appreciate it!
A small vocabulary size shouldn't be an issue. Unless your data is not natural images R@1 should not be that bad. Text encoder is always trained from scratch and is not related to the image encoder.
My understanding is that you are training on a new dataset. In that case, make sure the vocabulary has been rebuilt specifically for your dataset correctly and the it is being used accurately. It would help to create a data loader and print and visualize samples.
im actually using images generated by Blender, so i guess that may be the reason why?
i have attempted to train a new image encoder based off my data. and my vocab is small and i didn't find any issues with it.
it seems that the embedding i trained on this new image encoder is still bad.
R@1 ~= 0.1 for both caption and text retrievals.
may you provide some common practice if i were using a new dataset?
thank you for your reply! much appreciate it!