Local-first retrieval augmented generation assistant with conversational memory.
This repository hosts a FastAPI backend and a React frontend with professional chatbot UI.
backend/ # FastAPI application with RAG capabilities
frontend/ # React + Vite interface with professional chat UI
You will need Docker Desktop and Ollama installed on your system. Ollama should be running with the mistral:7b and nomic-embed-text models downloaded. Ensure you have at least 8GB RAM and 10GB free storage for the models.
- Docker and Docker Compose
- Ollama running locally with models:
ollama serve ollama pull mistral:7b ollama pull nomic-embed-text
# Start both backend and frontend
docker-compose up -d
# View logs
docker-compose logs -f
# Stop everything
docker-compose down- Frontend (Chat UI): http://localhost:5173/
- Backend API: http://localhost:8000/
- API Documentation: http://localhost:8000/docs
See backend/README.md for backend details.
Run the backend locally:
cd backend
pip install -r requirements.txt
uvicorn backend.main:app --reloadThe frontend (in frontend/) uses Vite, React, TypeScript, Tailwind CSS, and shadcn/ui.
cd frontend
npm install
npm run devCreate frontend/.env.local with:
VITE_API_BASE_URL=http://localhost:8000
PYTHONPATH=. pytest backend/testsTest the query endpoint:
# PowerShell
$body = @{ question = "What is my latest account balance?" } | ConvertTo-Json
Invoke-RestMethod -Uri "http://localhost:8000/query" -Method POST -Body $body -ContentType "application/json"
# curl
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is my latest account balance?"}'- Document Q&A: Upload and query PDF, DOCX, CSV, TXT files
- Conversational Memory: AI remembers conversation context
- Professional Chat UI: Modern interface with avatars and timestamps
- Local LLM: Uses Ollama for privacy-first AI
- RAG Pipeline: Semantic search with vector embeddings
- Real-time Responses: Fast document retrieval and generation
- LLM:
mistral:7b(configurable inbackend/.env) - Embeddings:
nomic-embed-text - Alternative: Switch to
llama3.2:1bfor faster responses
# backend/.env
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=mistral:7b