This is a complete RAG applicaiton that can use an artibrary GGUF quantized model hosted on Hugginface, as an example https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF
This will build complete RAG app including:
- Vectorstore based on pgvector
- Embedding model
- Locally run LLM
Make sure you have docker installed and configured.
Then clone the rep and run:
make up
Send requests to the chat service by using the script requestsender.py.
python ./requestsender.py -i '<Your query here>' -f
The state of the branch as is is to use https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF.
This is determined in .env, by setting the variables HG_REPO
and HG_FILE
.
Since different models use differently formatted prompts, create a new Jinja2 prompt template for your model if needed. Be sure to include the variables INPUTQUERY
and BULLET_LIST
.
The templates are found under chat-service/prompt_templates.