illuin-tech/colpali

OCR with colpali

lukiod opened this issue · 1 comments

Is it possible for a model successfully extracts text from the image and returns the extracted text in a structured format (JSON or plain text) using colpali.

Hello ! That 's kind of the opposite of the point of ColPali... But most VLMs nowadays can definitely do that, so you can combine colpali for retrieving the page you want and a VLM to do justtaht !