jina-ai/examples

How to perform parallel search from 2 type of data (text and image)?

ilham-bintang opened this issue · 3 comments

Let say I have a dataset of scanned math subject questions cropped per question and the data query as a single question picture. I want to get top-k best similar question.

So it needs to be extracted in 3 ways:

  • extract the text/latex using OCR
  • crop math graph (if any), then check the graph similarity

How to perform the search? and merge results?

hi @ilham-bintang , thanks for your interest in Jina!

I would suggest you first:

  1. Research if there is a 3rd library OCR tool for math/latex expression extraction, and wrap this python library as a Jina Executor, checkout how to create a custom executor.
  2. Create an Encoder to encode your text/latex and math graph into the same embedding space (not sure if there is a model designed for this purpose).
  3. Take a look at Jina hello multimodal example in Jina core (jina/helloworld/multimodal), it share the same rational as what you're trying to do: search across different modality of data.

I'm not quite sure if I understand your question correctly, let me know if you have further qustions.

Hey @ilham-bintang ,

Can you give more details of how u want to perform the search?

What is ur index data? Images and text or just one of them? How do you expect to query the system?

Hi @bwanglzu and @JoanFM

Yes, I want to index the image and text and perform a search from both data, then rank/weights/merge from the two results as a single list (sorted by most similar with the query).

I think I need to learn more about Jina hello multimodal for now. Anyway, thanks a lot for the explanation. Just stepping here and see the possibility if this framework could fit with my use-case