InspiringPDFs

The chatbot allows users to upload a PDF, process the document, and ask questions about the content.

Project Structure

  • main.py: Entry point of the application.
  • data_preprocessing.py: Handles PDF reading, text extraction, and chunking.
  • data_ingestion.py: Manages vector database operations and data ingestion.
  • chat_pipeline.py: Handles queries using chains, retrievers, memory, and compressors.

Requirements

  • Python 3.8+
  • Streamlit
  • PyPDF2
  • LangChain
  • FAISS
  • OpenAI
  • python-dotenv
  • streamlit_extras

Installation

  1. Clone the repository.
  2. Install the required packages:
    pip install -r requirements.txt
  3. Add .env file and place the OPENAI_API_KEY:
    OPENAI_API_KEY = sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
  4. Run the application
    streamlit run src/main.py

PDF CHAT ARCHITECTURE

image

LLM-Powered PDF Chat System Architecture with Langchain Memory and Compression

Untitled-2024-05-01-1113

Flow Explanation:

1. Upload and Data Extraction:

  • The user uploads PDFs, which are processed to extract textual data.

2. Text Chunking and Embedding:

  • Extracted text is split into chunks and embedded into vectors.

3. Vector Store and Compression:

  • Embeddings are stored in a vector store. Compression techniques are applied to optimize storage.

4. Memory and Query Processing:

  • Memory techniques store conversation context. The user's query is processed to retrieve relevant documents.

5. Context Building and Response Generation:

  • Context is built from retrieved documents. The system generates responses using context and memory.

6. User Interaction and Chat History:

  • The user interface displays chat history, showing the interactive conversation flow.

Ouput

image

Chat History

image