LinWeizheDragon/FLMR

Can this model achieve retrieval from text to (image + text)

Closed this issue · 4 comments

Can this model achieve retrieval from text to (image + text)? For example, I have a query (text) and a database that contains images and their corresponding descriptions. I want to retrieve the fused features of visual embeddings and text embeddings for each image in the database. If possible, how should I implement this?Thank you very much!

I noticed that the appendix of the paper mentioned 'Retrieving Multi-modal Documents with FLMR,' but I'm not sure how to use the related code. Could you please provide some guidance? Thank you very much!

Hi Please see the README file. We already implemented this:

# Option 3. multi-modal documents with images
# random_images = torch.randn(num_items, 3, 224, 224)
# to_img = ToPILImage()
# if not os.path.exists("./test_images"):
#     os.makedirs("./test_images")
# for i, image in enumerate(random_images):
#     image = to_img(image)
#     image.save(os.path.join("./test_images", "{}.jpg".format(i)))

# image_paths = [os.path.join("./test_images", "{}.jpg".format(i)) for i in range(num_items)]

# custom_collection = [
#     (passage_content, None, image_path)
#     for passage_content, image_path in zip(passage_contents, image_paths)
# ]

But note that due to the fact that data of image+text -> image+text is quite sparse, we did not pre-train the PreFLMR models on image+text -> image+text retrieval. Therefore, the performance may be suboptimal before you fine-tune the model on your own text -> image+text task.

嗨,请参阅自述文件。我们已经实现了这一点:

# Option 3. multi-modal documents with images
# random_images = torch.randn(num_items, 3, 224, 224)
# to_img = ToPILImage()
# if not os.path.exists("./test_images"):
#     os.makedirs("./test_images")
# for i, image in enumerate(random_images):
#     image = to_img(image)
#     image.save(os.path.join("./test_images", "{}.jpg".format(i)))

# image_paths = [os.path.join("./test_images", "{}.jpg".format(i)) for i in range(num_items)]

# custom_collection = [
#     (passage_content, None, image_path)
#     for passage_content, image_path in zip(passage_contents, image_paths)
# ]

This is great, thank you very much. I'll go try it right away.