🧠 Codebase RAG Assistant

Codebase RAG Assistant is a Streamlit-based application that lets you quickly index a GitHub repository, store the code as vector embeddings in Pinecone, and then retrieve relevant context to answer questions about the codebase using an LLM (GPT-4o). Perfect for large projects that need fast, intelligent codebase querying.

🚀 Key Features

Clone – User enters a GitHub repo URL. The app clones the repo locally, reads relevant files, and splits them into chunks.
Embed & Store – Vector embeddings are created for each chunk and stored in Pinecone under a unique namespace.
Wait for Ingestion – The app polls Pinecone to ensure the vector count matches the number of chunks added.
Query – Once ingestion completes, users can ask questions via the chat. The top results from Pinecone are pulled into a GPT-4o prompt, which returns a context-based answer.

💻 Tech Stack

Streamlit for the UI and user interactions.
GitPython for cloning GitHub repositories locally.
LangChain modules:
- langchain_openai for GPT-4o embeddings and chat completion,
- langchain.text_splitter for chunking code into smaller pieces,
- langchain.docstore.document to structure code chunks as documents.
Pinecone for vector database storage and retrieval.

🚀 Give it a try:

Codebase RAG Assistant

📸 Screenshots:

Yuvraj-R/Codebase-RAG

🧠 Codebase RAG Assistant

🚀 Key Features

💻 Tech Stack

🚀 Give it a try:

📸 Screenshots: