How to perform parallel search from 2 type of data (text and image)?
ilham-bintang opened this issue · 3 comments
Let say I have a dataset of scanned math subject questions cropped per question and the data query as a single question picture. I want to get top-k best similar question.
So it needs to be extracted in 3 ways:
- extract the text/latex using OCR
- crop math graph (if any), then check the graph similarity
How to perform the search? and merge results?
hi @ilham-bintang , thanks for your interest in Jina!
I would suggest you first:
- Research if there is a 3rd library OCR tool for math/latex expression extraction, and wrap this python library as a Jina Executor, checkout how to create a custom executor.
- Create an Encoder to encode your text/latex and math graph into the same embedding space (not sure if there is a model designed for this purpose).
- Take a look at Jina
hello multimodal
example in Jina core (jina/helloworld/multimodal
), it share the same rational as what you're trying to do: search across different modality of data.
I'm not quite sure if I understand your question correctly, let me know if you have further qustions.
Hey @ilham-bintang ,
Can you give more details of how u want to perform the search?
What is ur index data? Images and text or just one of them? How do you expect to query the system?
Yes, I want to index the image and text and perform a search from both data, then rank/weights/merge from the two results as a single list (sorted by most similar with the query).
I think I need to learn more about Jina hello multimodal
for now. Anyway, thanks a lot for the explanation. Just stepping here and see the possibility if this framework could fit with my use-case