How to perform parallel search from 2 type of data (text and image)?

Question

How to perform parallel search from 2 type of data (text and image)?

ilham-bintang opened this issue 4 years ago · 3 comments

Let say I have a dataset of scanned math subject questions cropped per question and the data query as a single question picture. I want to get top-k best similar question.

So it needs to be extracted in 3 ways:

extract the text/latex using OCR
crop math graph (if any), then check the graph similarity

How to perform the search? and merge results?

Answer 1 · 2021-04-22T08:42:27.000Z

hi @ilham-bintang , thanks for your interest in Jina!

I would suggest you first:

Research if there is a 3rd library OCR tool for math/latex expression extraction, and wrap this python library as a Jina Executor, checkout how to create a custom executor.
Create an Encoder to encode your text/latex and math graph into the same embedding space (not sure if there is a model designed for this purpose).
Take a look at Jina hello multimodal example in Jina core (jina/helloworld/multimodal), it share the same rational as what you're trying to do: search across different modality of data.

I'm not quite sure if I understand your question correctly, let me know if you have further qustions.

Answer 2 · 2021-04-22T08:58:31.000Z

Hey @ilham-bintang ,

Can you give more details of how u want to perform the search?

What is ur index data? Images and text or just one of them? How do you expect to query the system?

Answer 3 · 2021-04-22T10:49:31.000Z

Hi @bwanglzu and @JoanFM

Yes, I want to index the image and text and perform a search from both data, then rank/weights/merge from the two results as a single list (sorted by most similar with the query).

I think I need to learn more about Jina hello multimodal for now. Anyway, thanks a lot for the explanation. Just stepping here and see the possibility if this framework could fit with my use-case