This is based upon the bootstrapped version (via create-llama
) of LlamaIndex using FastAPI with a locally run LLM via Ollama.
First , go to backend
and create an .env
file with your environment variables, a sample setup can be found in backend/.env-sample
.
Next, setup the environment with poetry:
poetry install
poetry shell
Put your data sources in the backend/data
folder
Generate the embeddings of the documents in the ./data
directory:
poetry run generate
Run the development server:
python main.py
There are two API endpoint:
/api/chat/
- a streaming chat endpoint
/api/chat/request
- a non-streaming chat endpoint
You can use it via its streaming endpoint:
curl --location 'localhost:8000/api/chat/' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'
or its non-streaming counterpart:
curl --location 'localhost:8000/api/chat/request' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'
For a simple CLI assistant run
poetry run chat-cli
after having started the server.
- Streaming output ✅
- CLI ✅
- VectorDB (e.g. Weaviate)
- React Frontend