Internshala.Assg.mp4
Content Engine is a Retrieval Augmented Generation (RAG) system that processes multiple PDF documents to analyze, compare, and highlight their differences. It employs advanced techniques to retrieve relevant information, assess content, and generate insightful responses. This project leverages various machine learning models and libraries to achieve efficient document embedding and querying.
-Upload and process multiple PDF documents.
-Analyze and compare documents to identify and highlight differences.
-Utilize Retrieval Augmented Generation (RAG) for effective information retrieval and generation.
-Maintain chat history for contextual conversation.
-Streamlit interface for an interactive user experience.
Streamlit: For creating the web interface.
LangChain: For implementing the conversational retrieval chain.
HuggingFace Embeddings: For generating document embeddings.
LlamaCpp: For the language model.
FAISS: For the vector store to handle document retrieval.
PyPDFLoader: For loading and processing PDF documents.
RecursiveCharacterTextSplitter: For splitting text into manageable chunks.
ConversationBufferMemory: For maintaining chat history.
Python 3.7 or higher
Streamlit
LangChain
HuggingFace Transformers
FAISS
LlamaCpp
PyPDFLoader