Chhabii/InspiringPDFs

Python

InspiringPDFs

The chatbot allows users to upload a PDF, process the document, and ask questions about the content.

Project Structure

main.py: Entry point of the application.
data_preprocessing.py: Handles PDF reading, text extraction, and chunking.
data_ingestion.py: Manages vector database operations and data ingestion.
chat_pipeline.py: Handles queries using chains, retrievers, memory, and compressors.

Requirements

Python 3.8+
Streamlit
PyPDF2
LangChain
FAISS
OpenAI
python-dotenv
streamlit_extras

Installation

Clone the repository.
Install the required packages:
```
pip install -r requirements.txt
```

Add .env file and place the OPENAI_API_KEY:

OPENAI_API_KEY = sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Run the application
```
streamlit run src/main.py
```

PDF CHAT ARCHITECTURE

LLM-Powered PDF Chat System Architecture with Langchain Memory and Compression

Flow Explanation:

1. Upload and Data Extraction:

The user uploads PDFs, which are processed to extract textual data.

2. Text Chunking and Embedding:

Extracted text is split into chunks and embedded into vectors.

3. Vector Store and Compression:

Embeddings are stored in a vector store. Compression techniques are applied to optimize storage.

4. Memory and Query Processing:

Memory techniques store conversation context. The user's query is processed to retrieve relevant documents.

5. Context Building and Response Generation:

Context is built from retrieved documents. The system generates responses using context and memory.

6. User Interaction and Chat History:

The user interface displays chat history, showing the interactive conversation flow.

Ouput

Chat History