This project is an application of Retrieval-Augmented Generation (RAG), an AI framework that combines the power of large language models with additional information from reliable sources. Currently, I'm in the process of experimenting with various large language models to extract answers from a PDF document. The research is primarily conducted using Jupyter Notebooks, where we input a question, retrieve relevant information from the PDF document, and generate a response using the language model. Importantly, all of this can be done using open-source models, locally on our own computers, making this accessible and reproducible.
To get started, clone the repo and follow the installation instructions. This project uses Ollama, an open-source tool for working with large language models locally. Download it from Ollama. Once set up, you can ask your PDF document for information.
To set up the project on your local machine, follow these steps:
- Clone the repository:
git clone https://github.com/ralphcajipe/ask-pdf.git
- Navigate to the project directory:
cd ask-pdf
- Install the required dependencies:
pip install -r requirements.txt
For now, the project is structured using Jupyter Notebooks with PDF files as the data source.
Run the Jupyter Notebook and follow the instructions provided in the notebook. The notebook will guide you through the process of asking questions from a PDF document.`
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.