Local LLM Assistant

Local-first retrieval augmented generation assistant with conversational memory.

This repository hosts a FastAPI backend and a React frontend with professional chatbot UI.

Project Structure

backend/   # FastAPI application with RAG capabilities
frontend/  # React + Vite interface with professional chat UI

Prerequisites

You will need Docker Desktop and Ollama installed on your system. Ollama should be running with the mistral:7b and nomic-embed-text models downloaded. Ensure you have at least 8GB RAM and 10GB free storage for the models.

Quick Start with Docker

Prerequisites

Docker and Docker Compose

Ollama running locally with models:

ollama serve
ollama pull mistral:7b
ollama pull nomic-embed-text

Run the entire project

# Start both backend and frontend
docker-compose up -d

# View logs
docker-compose logs -f

# Stop everything
docker-compose down

Access the application

Frontend (Chat UI): http://localhost:5173/
Backend API: http://localhost:8000/
API Documentation: http://localhost:8000/docs

Development Setup

Backend

See backend/README.md for backend details.

Run the backend locally:

cd backend
pip install -r requirements.txt
uvicorn backend.main:app --reload

Frontend

The frontend (in frontend/) uses Vite, React, TypeScript, Tailwind CSS, and shadcn/ui.

cd frontend
npm install
npm run dev

Create frontend/.env.local with:

VITE_API_BASE_URL=http://localhost:8000

Testing

Backend Tests

PYTHONPATH=. pytest backend/tests

API Testing

Test the query endpoint:

# PowerShell
$body = @{ question = "What is my latest account balance?" } | ConvertTo-Json
Invoke-RestMethod -Uri "http://localhost:8000/query" -Method POST -Body $body -ContentType "application/json"

# curl
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is my latest account balance?"}'

Features

Document Q&A: Upload and query PDF, DOCX, CSV, TXT files
Conversational Memory: AI remembers conversation context
Professional Chat UI: Modern interface with avatars and timestamps
Local LLM: Uses Ollama for privacy-first AI
RAG Pipeline: Semantic search with vector embeddings
Real-time Responses: Fast document retrieval and generation

Configuration

Models

LLM: mistral:7b (configurable in backend/.env)
Embeddings: nomic-embed-text
Alternative: Switch to llama3.2:1b for faster responses

Environment Variables

# backend/.env
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=mistral:7b

rollendxavier/ragpilot

Local LLM Assistant

Project Structure

Prerequisites

Quick Start with Docker

Prerequisites

Run the entire project

Access the application

Development Setup

Backend

Frontend

Testing

Backend Tests

API Testing

Features

Configuration

Models

Environment Variables