illuin-tech/colpali

it would be better if it can involved the type class "text",such like the text embeding

Lbaiall opened this issue · 1 comments

Hi ! Thanks to opensource the Colpali ! ^.^
But here is a little question with when i use the https://github.com/illuin-tech/colpali/blob/9413418c110da49b25ac2dae2c32b8fc067ff332/scripts/infer/run_inference_with_python.py scripts .......
it actually running ,but when i change the format with my local Parquet file ( with generate by my require :i turn that images into the image_data event it some the other information image_id,image_type,image_content ... like code is :

def load_from_parquet(parquet_path):
df = pd.read_parquet(parquet_path)
images = [Image.open(io.BytesIO(row['image_data'])) for _, row in df.iterrows()]
return images, df
) it really dose good search in the reveal query result,But it made me think whether it can have the ability to retrieve images while obeying the type I give to the image or some of its meaning that belongs to this PDF, such as abstract things, or its true name it belong into PDF.

This is the conclusion I came to after a lot of experiments.
maybe in the i was missing the fuction "text understanding" module , in my viewpoint it like RAG text embeding system ,i don‘t want add other more text embeding model ,but it really case that I had to add it to increase the accuracy of my response related to text or type.

This is a question I raised. If I have neglected to see or understand this function, please let me know and I will give you feedback. I will be very grateful.

what is colpali really base on ? the pure images?i meaing it would be better if it can involved the type "text"

Hello ! So yes ColPali only takes into account images !
It is the reason it is so different from existing systems to take into account text, it enables many new things !
Having said that, many text embedding models are great, and you can use models like BGE-M3 typically for text embeddings if you don't want to embed the image !
Cheers,
Manu