Techniques to facilitate Multi-modal information retrieval using advance document parsing libraries to extract unique modalities from content.
API service that does advanced PDF parsing and metadata extraction.
Create a pdfservices-api-credentials.json
and use Postman to interact with the API service.
This library is useful from parsing the text structure of documents.
To run pdf2htmlex, follow the setup instructions and run:
APPIMAGE_EXTRACT_AND_RUN=1 ./pdf2htmlex.AppImage <pdfname.pdf>