Query, ask and chat with a document-index via a friendly web ui!
Simply run it via Docker-Compose. An example configuration can be found in the docker-compose.yml
file.
The UI is then available via http://localhost:8501 and the API-Swagger-Documentation via http://localhost:8001/docs
If you want build it from source just download the repo and use the docker-compose-src-*.yml
files.
an example .env
config can be found in .env.example
.
All modules are available as prebuild containers via the Github container registry.
The demo has three main components:
- Streamlit web ui
- FastAPI endpoint, which hosts the models and QA pipelines
- Importers, which import textfiles into the system
The chat module allows the user to either use a LLM to query and summarize retrieved documents from the document index or just chat with the model normaly. It supports multiple chat adapter which are exposed via a streaming api. The following libraries are supported:
- OpenAI: Uses the default ChatGPT API.To use this adapter a OpenAI API-Token has to be supplied.
- Huggingface: Supports nearly all LLMs on the Huggingfacehub. Also supports PEFT finetuned models. To run this a GPU needs to be passed to the Container running the API.
- llama-rs: Can run GGML converted models like Alpaca on a CPU with relatively low resource usage. Use this adapter if you dont have a GPU.
The semantic search and extractive qa modules use Haystack to query the ElasticSearch database. Nearly all embedding and QA models on the Huggingfacehub are supported. By default they will be executed on the CPU as a GPU is commonly allocated to the chat model.
The importers collect text documents and covert them to Haystack-Documents which are then commited to the database.
Uses a Wikipedia minidump to populate the database with about 500.000 Wikipedia Articles.
Can be used to import technical documentation which was created via the Schema ST4 software.
To be able to run this repo on different hardware configurations many settings are configurable via environment variables
Environment Variable | Default | Description |
---|---|---|
HUGGINGFACE_TOKEN | Huggingface token | |
ELASTICSEARCH_HOST | localhost | Elasticsearch host address |
ELASTICSEARCH_PORT | 9200 | Elasticsearch port number |
ELASTICSEARCH_USER | Elasticsearch user | |
ELASTICSEARCH_PASSWORD | Elasticsearch password | |
EMBEDDING_DIM | 384 | Embedding dimension |
SIMILARITY | cosine | Similarity measure |
EMBEDDING_MODEL | LLukas22/all-MiniLM-L12-v2-embedding-all | Embedding model |
EXTRACTIVE_QA_MODEL | LLukas22/all-MiniLM-L12-v2-qa-en | Extractive QA model |
USE_GPU | False | Use GPU for QA and embedding |
USE_8BIT | False | Use bits-and-bytes |
CONCURENCY_LIMIT | 5 | Concurrency limit of api |
DEBUG | True | Debug mode |
CHATMODEL | CPU | Chat Adapter to use (OPENAI,GPU,CPU) |
CHAT_MAX_INPUT_LENGTH | 2000 | Chat max input length |
OPENAI_TOKEN | None | OpenAI token |
BASE_CHAT_MODEL | decapoda-research/llama-7b-hf | Base chat model |
USE_PEFT | True | Use PEFT |
ADAPTER_CHAT_MODEL | tloen/alpaca-lora-7b | Adapter chat model |
ADAPTER_APPLY_OPTIMIZATIONS | True | Apply Torch optimizations |
CPU_MODEL_REPO | Sosaka/Alpaca-native-4bit-ggml | CPU model repository |
CPU_MODEL_FILENAME | ggml-alpaca-7b-q4.bi | CPU model filename |
CPU_MODEL_THREADS | 8 | CPU model threads |
CPU_MODEL_KV_16 | True | CPU model use f16 for KV-Store |
Environment Variable | Default |
---|---|
SYSTEM_PROMPT | The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. \n\n Current Conversation: |
CHAT_WELCOME_MESSAGE | Hello, I will try to answer any questions you have for me! Untick the checkbox to disable the document search and chat with me normally. |
WELCOME_MESSAGE | This demo was initialized with about 500,000 English Wikipedia articles from April 1st, 2023. Feel free to ask the system about any topic you like. \n\n |
API_HOST | localhost |
API_PORT | 8001 |
ENABLE_ADMIN | False |
Environment Variable | Default | Description |
---|---|---|
ELASTIC_HOST | localhost | Elasticsearch host address |
ELASTIC_PORT | 9200 | Elasticsearch port number |
ELASTIC_EMBEDDING_DIM | 384 | Embedding dimension |
CACHE_DIR | ./importer_cache | Cache directory path |
WIKI_URL | https://dumps.wikimedia.org/simplewiki/20230401/simplewiki-20230401-pages-articles-multistream.xml.bz2 | URL of the Wiki-dump to download |
Environment Variable | Default | Description |
---|---|---|
ELASTIC_HOST | localhost | Elasticsearch host address |
ELASTIC_PORT | 9200 | Elasticsearch port number |
ELASTIC_EMBEDDING_DIM | 384 | Embedding dimension |
ST4_FOLDER | ./.st4_files | ST4 files folder path |