This project is motivated from blog post from llamaindex link
This RAG application will be running complete locally. I have used ollama(running llm model locally),llamaindex(RAG application), qdrant(vector db), flask(web api).
Install ollama for running the LLM on local machine, Then
-
Terminal-1 ollama run mistral #ollama run mixtral
-
Terminal-2 docker run -p 6333:6333 qdrant/qdrant
-
Terminal-3 python3 -m venv venv source venv/bin/activate pip install llama-index qdrant_client torch transformers flask flask-cors
Just to make sure the LLM is listening run test.py
from llama_index.llms import Ollama
llm = Ollama(model="mistral") response = llm.complete("Who is Laurie Voss?") print(response)
Then run the flask application
python app.py
Open another terminal and check the app
- Terminal-4
curl --location "http://127.0.0.1:5000/process_form" --form 'query="What does the author -----?"'
Note:
- Dataset should be in json format.
- Due to RAM requirement of a hefty 48GB of RAM to run smoothly for Mixtral, I used Mistral 7b model.But can be run Mixtral with the same procedure if your machine can manage 48GB of RAM