/Content-Engine-RAG-for-PDF

Content Engine is RAG system that analyzes and compares multiple PDF documents, specifically identifying and highlighting their differences. The system will utilize Retrieval Augmented Generation (RAG) techniques to effectively retrieve, assess, and generate insights from the documents.

Primary LanguageJupyter Notebook

PDF_LLM

Demo

Internshala.Assg.mp4

Overview

Content Engine is a Retrieval Augmented Generation (RAG) system that processes multiple PDF documents to analyze, compare, and highlight their differences. It employs advanced techniques to retrieve relevant information, assess content, and generate insightful responses. This project leverages various machine learning models and libraries to achieve efficient document embedding and querying.

Features

-Upload and process multiple PDF documents.

-Analyze and compare documents to identify and highlight differences.

-Utilize Retrieval Augmented Generation (RAG) for effective information retrieval and generation.

-Maintain chat history for contextual conversation.

-Streamlit interface for an interactive user experience.

Technologies Used

Streamlit: For creating the web interface.

LangChain: For implementing the conversational retrieval chain.

HuggingFace Embeddings: For generating document embeddings.

LlamaCpp: For the language model.

FAISS: For the vector store to handle document retrieval.

PyPDFLoader: For loading and processing PDF documents.

RecursiveCharacterTextSplitter: For splitting text into manageable chunks.

ConversationBufferMemory: For maintaining chat history.

Prerequisites

Python 3.7 or higher

Streamlit

LangChain

HuggingFace Transformers

FAISS

LlamaCpp

PyPDFLoader