This is a repository used to demonstrate how to use LLMs for document processing. It is part of a course there fore the xml extraction part is left as a task for the students. The repository is build with poetry and uses the following libraries:
- Langchain
- Loguru
- Jinja2
- PyPdfium2
To install this project please use poetry. The project is build with Python 3.11
git clone https://github.com/mfmezger/document-processing-ollama
cd document-processing-ollama
poetry install
The dataset used is the Samples of electronic Invoices Dataset from Mendeley Data. The dataset ist available here: https://data.mendeley.com/datasets/tnj49gpmtz/2 and licenced under CC BY 4.0.