/ragpilot

Local-first RAG Assistant powered by FastAPI, LangChain, Ollama (Mistral 7B), and FAISS.

Primary LanguagePythonApache License 2.0Apache-2.0

Local LLM Assistant

Local-first retrieval augmented generation assistant with conversational memory.

This repository hosts a FastAPI backend and a React frontend with professional chatbot UI.

Project Structure

backend/   # FastAPI application with RAG capabilities
frontend/  # React + Vite interface with professional chat UI

Prerequisites

You will need Docker Desktop and Ollama installed on your system. Ollama should be running with the mistral:7b and nomic-embed-text models downloaded. Ensure you have at least 8GB RAM and 10GB free storage for the models.

Quick Start with Docker

Prerequisites

  • Docker and Docker Compose
  • Ollama running locally with models:
    ollama serve
    ollama pull mistral:7b
    ollama pull nomic-embed-text

Run the entire project

# Start both backend and frontend
docker-compose up -d

# View logs
docker-compose logs -f

# Stop everything
docker-compose down

Access the application

Development Setup

Backend

See backend/README.md for backend details.

Run the backend locally:

cd backend
pip install -r requirements.txt
uvicorn backend.main:app --reload

Frontend

The frontend (in frontend/) uses Vite, React, TypeScript, Tailwind CSS, and shadcn/ui.

cd frontend
npm install
npm run dev

Create frontend/.env.local with:

VITE_API_BASE_URL=http://localhost:8000

Testing

Backend Tests

PYTHONPATH=. pytest backend/tests

API Testing

Test the query endpoint:

# PowerShell
$body = @{ question = "What is my latest account balance?" } | ConvertTo-Json
Invoke-RestMethod -Uri "http://localhost:8000/query" -Method POST -Body $body -ContentType "application/json"

# curl
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is my latest account balance?"}'

Features

  • Document Q&A: Upload and query PDF, DOCX, CSV, TXT files
  • Conversational Memory: AI remembers conversation context
  • Professional Chat UI: Modern interface with avatars and timestamps
  • Local LLM: Uses Ollama for privacy-first AI
  • RAG Pipeline: Semantic search with vector embeddings
  • Real-time Responses: Fast document retrieval and generation

Configuration

Models

  • LLM: mistral:7b (configurable in backend/.env)
  • Embeddings: nomic-embed-text
  • Alternative: Switch to llama3.2:1b for faster responses

Environment Variables

# backend/.env
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=mistral:7b