single caption query
wingz1 opened this issue · 4 comments
This code works quite well. Thanks for sharing it.
I'm wondering, do you have any code snippets to show how one might use a trained VSE++ model to create their own caption query from text (i.e. a string), submit it to the VSE++ model to get a single caption embedding and then search for matching images that have also mapped to the joint space using the same model?
It's easy to do the comparison once a numpy array for the caption and image embeddings in joint space are created, but it's not clear how to use your model with a brand new caption query or simply a set of CNN image features that are not part of some complete COCO/FLICKR/etc train or test set with corresponding caption/image pairs.
Thanks for any tips. I'd prefer not to rewrite everything if you already have some additional tools for this.
I don't have any particular script for that purpose. But you can look at the function encode_data
to get an idea:
Line 73 in 226688a
encode_data
gets the input from data_loader
and encodes all images and captions given by that loader. It's probably easiest to write a special data loader class that handles your data. For that, take a look at data.py
.
@wingz1 were you able to do it, any snippets or tips?
I am having a similar task in hand, I want to utilize COCO captions to extract top k images.
Yes, actually. I added a "def caption2emb( model, mycaption, vocab )" function to evaluate.py