Chat with pdf using Local VectorStore (FAISS)
This Python script demonstrates document retrieval and question answering capabilities using PDF documents.
FAISS (Facebook AI Similarity Search) is an open-source library developed by Facebook AI Research, designed to efficiently search for similarities in large datasets of high-dimensional vectors. It specializes in clustering large amounts of data and quickly retrieving items similar to a query. FAISS uses optimized algorithms to accelerate the search process and reduce memory usage.
-
Environment Setup: The script loads environment variables from a
.env
file usingdotenv
. -
Document Loading: It uses
PyPDFLoader
fromlangchain
to load a PDF document named "react.pdf". -
Text Splitting: The loaded document is split into smaller chunks of text using
CharacterTextSplitter
fromlangchain
. Each chunk has a size of 1000 characters with a 30-character overlap and is separated by newline characters. -
Embeddings Generation: OpenAI embeddings are generated for the text chunks using
OpenAIEmbeddings
fromlangchain
. -
Vector Store Creation: A vector store is created using
FAISS
fromlangchain_community.vectorstores
. The vector store is populated with the text embeddings generated in the previous step. -
Saving Vector Store: The created vector store is saved locally with the name "faiss_index_react".
-
Loading Vector Store: The saved vector store is loaded back into memory using
FAISS
with dangerous deserialization enabled. -
Question Answering Setup: A retrieval-based question answering model is initialized using
RetrievalQA
fromlangchain.chains
. The model is configured with an OpenAI language model (LLM) and the loaded vector store as a retriever. -
Question Answering: A sample question ("What are the disadvantages of using React?") is passed to the question answering model for inference.
-
Print Result: The result of the question answering process is printed to the console.
- Ensure that the "react.pdf" file exists in the current directory.
- Install the required dependencies specified in the script (e.g.,
langchain
,dotenv
, etc.). - Run the script to perform document retrieval and question answering on the provided PDF document.