This is a sample project that showcases Open AI capability in Q&A for document understanding using langchain + Chroma DB.
- Read PDF
- Transform PDF to image
- Perform OCR to extract Text
- Chunk document into smaller text length
- Load Open AI embedding into a vector search DB (ChromaDB)
- Query the model
- Enjoy the answers!
The following use cases are explored:
- Performing Q&A over a construction contract (In English)
- Performing Q&A over Panama's Data Protection Law (In Spanish)
This code can run directly in Paperspace Gradient, or Google Colab
---- FOR DEMO USE ONLY NOT FOR COMMERCIAL USE----