This is a LLM-based document QnA pipeline that extracts text from a given document and provides answers to user prompts regrading the content of the document.
The pipeline was tested on a couple of documents containing information of the electoral bonds vis-a-vis the Indian General Elections, from 2019 to 2024 provided by Election Commission of India.
The "party_bonds.pdf" file contains the complete list of electoral bonds encashed by all the listed political parties between 2019 to 2024.
The "comapny_bonds.pdf" file contains the list of private/public entities that provided the bonds within the same timeline and of various denominations. Note that every bond has a unique bond number with it and hence makes tracking transactions easier in these documents.
We have performed LLM-based Named-Entity Recognition and LLM-based Query Extraction specifically to extract data from these documents and run a query-based system to tend to indiviual user prompts based on the contents of these documents.
Provided here is a Colab notebook that can be used to access the StreamLit website where our model has been deployed. You would have to upload a couple of documents and ask questions based on the format given below.
Some sample prompts are given as follows:
"What was the total amount received by <POLITICAL_PARTY> on 20th October 2022?"
"What was the total amount received by <POLITICAL_PARTY> by <PURCHASER_NAME> in the month of July in 2021?"