The chatbot allows users to upload a PDF, process the document, and ask questions about the content.
- main.py: Entry point of the application.
- data_preprocessing.py: Handles PDF reading, text extraction, and chunking.
- data_ingestion.py: Manages vector database operations and data ingestion.
- chat_pipeline.py: Handles queries using chains, retrievers, memory, and compressors.
- Python 3.8+
- Streamlit
- PyPDF2
- LangChain
- FAISS
- OpenAI
- python-dotenv
- streamlit_extras
- Clone the repository.
- Install the required packages:
pip install -r requirements.txt
- Add .env file and place the OPENAI_API_KEY:
OPENAI_API_KEY = sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
- Run the application
streamlit run src/main.py
- The user uploads PDFs, which are processed to extract textual data.
- Extracted text is split into chunks and embedded into vectors.
- Embeddings are stored in a vector store. Compression techniques are applied to optimize storage.
- Memory techniques store conversation context. The user's query is processed to retrieve relevant documents.
- Context is built from retrieved documents. The system generates responses using context and memory.
- The user interface displays chat history, showing the interactive conversation flow.