The goal of this project is to create an interface using Gradio where users can upload PDF files containing personal expense information. Utilizing techniques like Text Embeddings and Generative AI, users will be able to extract relevant information from the uploaded data.
- Get user data in the proper format (PDF)
- Apply Text Embedding to the PDF so that the model (LLaMA) can understand it
- Send the data to a Vector Database (Pinecone)
- Adjust the prompt and other aspects related to the desired response
- Provide the data to the model
- Evaluate the response
The chosen dynamic language is Python, and the preferred IDE is Google Colab. For local implementation, additional steps are needed, which can be found in the documentation of the libraries mentioned in this file. Additionally, access to a GPU is required.
Install the following packages:
import os
import torch
import pinecone
import transformers
import gradio as gr
from pinecone import Pinecone
from torch import cuda, bfloat16
from langchain.vectorstores import Pinecone
from pinecone import Pinecone, ServerlessSpec
from langchain.llms import HuggingFacePipeline
from langchain_pinecone import PineconeVectorStore
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from transformers import StoppingCriteria, StoppingCriteriaList
from langchain.chains import StuffDocumentsChain, LLMChain, ConversationalRetrievalChain
Download the Python file app.ipynb
and open it in the IDE.
Carlos L. - GitHub
This project is licensed under the MIT License.