it would be better if it can involved the type class "text"，such like the text embeding

Question

it would be better if it can involved the type class "text"，such like the text embeding

Lbaiall opened this issue 4 months ago · 1 comments

Hi ! Thanks to opensource the Colpali ! ^.^
But here is a little question with when i use the https://github.com/illuin-tech/colpali/blob/9413418c110da49b25ac2dae2c32b8fc067ff332/scripts/infer/run_inference_with_python.py scripts .......
it actually running ,but when i change the format with my local Parquet file ( with generate by my require ：i turn that images into the image_data event it some the other information image_id，image_type，image_content ... like code is :

def load_from_parquet(parquet_path):
df = pd.read_parquet(parquet_path)
images = [Image.open(io.BytesIO(row['image_data'])) for _, row in df.iterrows()]
return images, df
) it really dose good search in the reveal query result,But it made me think whether it can have the ability to retrieve images while obeying the type I give to the image or some of its meaning that belongs to this PDF, such as abstract things, or its true name it belong into PDF.

This is the conclusion I came to after a lot of experiments.
maybe in the i was missing the fuction "text understanding" module , in my viewpoint it like RAG text embeding system ，i don‘t want add other more text embeding model ，but it really case that I had to add it to increase the accuracy of my response related to text or type.

This is a question I raised. If I have neglected to see or understand this function, please let me know and I will give you feedback. I will be very grateful.

what is colpali really base on ? the pure images？i meaing it would be better if it can involved the type "text"

Answer 1 · 2024-08-23T12:04:34.000Z

Hello ! So yes ColPali only takes into account images !
It is the reason it is so different from existing systems to take into account text, it enables many new things !
Having said that, many text embedding models are great, and you can use models like BGE-M3 typically for text embeddings if you don't want to embed the image !
Cheers,
Manu