PDF-GPT is a web application that uses OpenAI's GPT to extract and process information from PDF files. The application is built using Flask and can be deployed using Docker and Azure.
The final webapp is in here
*note: 1. you need to get your OpenAI API keys from OpenAI. Read more about how to get the OpenAI API keys from windowscentral
Click to Watch the developing map and project video below to get more information:
- Upload and process PDF files
- Extract text from PDF files
- Use GPT to search and process the extracted text
- Display processed information on the web interface
- Docker
- Python 3.9+
- Azure CLI (for deployment)
- Clone the repository:
git clone https://github.com/yourusername/721Final_project_Scott_Lorna.git
cd 721Final_project_Scott_Lorna
- Create an
env.list
file in the project directory with your API keys:
PINECONE_API_KEY=<your_pinecone_api_key> PINECONE_API_ENV=<your_pinecone_api_env>
-
working with Pinecone Pinecone is a vector database service that helps to store and process the information extracted from the PDF files in a more efficient and scalable manner. The application interacts with Pinecone using the API key and environment value provided during the Pinecone account setup.
-
To integrate Pinecone in the application, you'll need to:
- Sign up for a Pinecone account and obtain the API key and environment value.
- Add the API key and environment value to the
env.list
file. - Update the application code to interact with Pinecone's vector storage system for processing and storing extracted information.
For more information on working with Pinecone, refer to their official documentation.
-
Set up Pinecone API Key and Environment: After signing up, you'll receive an API key and environment value. Add these to your env.list file:
- Build the Docker image:
docker build -t pdf-gpt .
- Run the application using Docker:
docker run -p 5000:5000 --env-file env.list pdf-gpt
- Open your browser and navigate to
http://localhost:5000
.
- Log in to Azure:
az login
- Create a resource group and a web app:
az group create --name pdf-gpt_group --location
az appservice plan create --name pdf-gpt-plan --resource-group pdf-gpt_group --sku B1 --is-linux
az webapp create --resource-group pdf-gpt_group --plan pdf-gpt-plan --name pdf-gpt --deployment-container-image-name <your_docker_image>
- Set environment variables in Azure:
az webapp config appsettings set --resource-group pdf-gpt_group --name pdf-gpt --settings PINECONE_API_KEY=<your_pinecone_api_key>
az webapp config appsettings set --resource-group pdf-gpt_group --name pdf-gpt --settings PINECONE_API_ENV=<your_pinecone_api_env>
az webapp config appsettings set --resource-group pdf-gpt_group --name pdf-gpt --settings WEBSITES_PORT=5000