The purpose of this package is to offer a convenient question-answering system with a simple YAML-based configuration that enables interaction with multiple collections of local documents. Special attention is given to improvements in various components of the system in addition to LLMs - better document parsing, hybrid search, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.
-
Supported formats
.md
- Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more..pdf
- MuPDF-based parser..html
,.epub
- supported through theUnstructured
pre-processor - https://unstructured-io.github.io/unstructured/.docx
- custom parser, supports nested tables.
-
Supports multiple collection of documents, and filtering the results by a collection.
-
Generates dense embeddings from a folder of documents and stores them in a vector database (ChromaDB).
- The following embedding models are supported:
- Huggingface embeddings.
- Sentence-transformers-based models, e.g.,
multilingual-e5-base
. - Instructor-based models, e.g.,
instructor-large
.
- The following embedding models are supported:
-
Generates sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).
-
Supports the "Retrieve and Re-rank" strategy for semantic search, see - https://www.sbert.net/examples/applications/retrieve_rerank/README.html.
-
Allows interaction with embedded documents, supporting the following models and methods (including locally hosted):
- OpenAI models (ChatGPT 3.5/4 and Azure OpenAI).
- HuggingFace models.
- Llama cpp supported models - for full list see https://github.com/ggerganov/llama.cpp#description
- AutoGPTQ models (temporarily disabled due to broken dependencies).
-
Other features
- Simple CLI and web interfaces.
- Deep linking into document sections - jump to an individual PDF page or a header in a markdown file.
- Ability to save responses to an offline database for future analysis.
- Experimental API
- Tested on Ubuntu 22.04.
- Nvidia GPU is required for embeddings generation and usage of locally hosted models.
- Python 3.10, including dev packages (
python3-dev
on Ubuntu) - Nvidia CUDA Toolkit (tested with v11.7) - https://developer.nvidia.com/cuda-toolkit
- To interact with OpenAI models, create
.env
in the root directory of the repository, containing OpenAI API key. A template for the.env
file is provided in.env_template
- For parsing
.epub
documents, Pandoc is required - https://pandoc.org/installing.html
git clone https://github.com/snexus/llm-search.git
cd llm-search
# Create a new environment
python3 -m venv .venv
# Activate new environment
source .venv/bin/activate
# Set variables for llama-cpp to compile with CUDA.
# Optionally, point to the location root of the install NVidia CUDA Toolkit (/usr/local/cuda on Ubuntu)
# source ./setvars.sh /usr/local/cuda
source ./setvars.sh /usr/local/cuda
# Install the package
pip install . # or `pip install -e .` for development
To create a configuration file in YAML format, you can refer to the example template provided in sample_templates/config_template.yaml
.
The sample configuration file specifies how to load one of the supported locally hosted models, downloaded from Huggingface - https://huggingface.co/TheBloke/wizardLM-13B-1.0-GGML/resolve/main/WizardLM-13B-1.0.ggmlv3.q5_K_S.bin
As an alternative uncomment the llm section for OpenAI model.
To create embeddings from documents, follow these steps:
- Open the command line interface.
- Run the following command:
llmsearch index create -c /path/to/config.yaml
Based on the example configuration above, executing this command will scan a folder containing markdown and pdf files (/path/to/docments
) excluding the files in subfolder1
and subfolder2
and generate a dense embeddings database in the /path/to/embedding/folder
directory. Additionally, a local cache folder (/path/to/cache/folder
) will be utilized to store embedding models, LLM models, and tokenizers.
The default vector database for dense is ChromaDB, and default embedding model is e5-large-v2
(unless specified otherwise using embedding_model
section such as above), which is known for its high performance. You can find more information about this and other embedding models at https://huggingface.co/spaces/mteb/leaderboard.
In addition to dense embeddings, sparse embedding will be generated in /path/to/embedding/folder/splade
using SPLADE algorithm. Both dense and sparse embeddings will be used for context search.
To interact with the documents using one of the supported LLMs, follow these steps:
- Open the command line interface.
- Run one of the following commands:
- Web interface:
llmsearch interact webapp -c /path/to/config.yaml
- CLI interface:
llmsearch interact llm -c /path/to/config.yaml
Based on the example configuration provided in the sample configuration file, the following actions will take place:
- The system will load a quantized GGML model using the LlamaCpp framework. The model file is located at
/storage/llm/cache/WizardLM-13B-1.0-GGML/WizardLM-13B-1.0.ggmlv3.q5_K_S.bin
. - The model will be partially loaded into the GPU (30 layers) and partially into the CPU (remaining layers). The
n_gpu_layers
parameter can be adjusted according to the hardware limitations. - Additional LlamaCpp specific parameters specified in
model_kwargs
from thellm->params
section will be passed to the model. - The system will query the embeddings database using hybrid search algorithm using sparse and dense embeddings. It will provide the most relevant context from different documents, up to a maximum context size of 4096 characters (
max_char_size
insemantic_search
). - When displaying paths to relevant documents, the system will replace the part of the path
/storage/llm/docs/
withobsidian://open?vault=knowledge-base&file=
. This replacement is based on the settingssubstring_search
andsubstring_replace
insemantic_search->replace_output_path
.
To launch an api, supply a path config file in the FASTAPI_LLM_CONFIG
environment variable and launch llmsearchapi
FASTAPI_LLM_CONFIG="/path/to/config.yaml" llmsearchapi