train on synthetic dataset

Question

train on synthetic dataset

nanlliu opened this issue 4 years ago · 2 comments

Image to text: R@1 is bad and it fluatuates around (1.5)
Text to image: R@1 is bad and it fluatuates around (0.5)

my vocabulary size is small (~100 words).

do you think training my own 'resnet' image/text encoder on synthetic images would help?

Much appreciate it!

Answer 1 · 2021-02-16T23:52:52.000Z

A small vocabulary size shouldn't be an issue. Unless your data is not natural images R@1 should not be that bad. Text encoder is always trained from scratch and is not related to the image encoder.
My understanding is that you are training on a new dataset. In that case, make sure the vocabulary has been rebuilt specifically for your dataset correctly and the it is being used accurately. It would help to create a data loader and print and visualize samples.

Answer 2 · 2021-02-17T03:33:04.000Z

im actually using images generated by Blender, so i guess that may be the reason why?
i have attempted to train a new image encoder based off my data. and my vocab is small and i didn't find any issues with it.
it seems that the embedding i trained on this new image encoder is still bad.
R@1 ~= 0.1 for both caption and text retrievals.

may you provide some common practice if i were using a new dataset?
thank you for your reply! much appreciate it!