Setup a vector database with search API to assist RAG
Opened this issue · 1 comments
Retrieval Augmented Generation (RAG) is a process by which relevant documentation is selected from a corpus and appended to a the prompt. This enables specialised and highly focused context to be added to the model's input.
We have a couple of services coming up which could benefit from this.
Rag works by converting, or embedding, strings into vectors. The source corpus or knowledgebase is encoded into vectors, as is the search query. These vectors can then be compared to find the best or worst matches, depending on what you want.
The first step is to add support for a vector database to apollo. Here is a rough spec:
- Add a vector database like Milvus to apollo
- Create a service which allows the database to be searched, and returns relevant strings. I think the search API is something like
search(corpus_name, search_string)
, and the service will convert the search string into an embedding, and run it against the databasecorpus_name
. - When the docker image is built, take a collection of corpuses (this should be simplified in this step and the corpus can be ah hard-coded list of strings) and embed them into the database. The database should be built into the docker image.
The runtime embeddings database is basically read-only. I don't see any need to extend the embeddings on the fly.
Note that the embedding function requires a pretrained model which will likely be 50-100mb in size. This will have to be bundled into the docker image and may have performance and storage implications for our deployment.
Embedding the knowledgebase can be done offline, when we build the image, but search queries must be embedded at runtime.
Note that this issue only requires a test corpus to run against - the problem of inputting a real knowledegbase (ie, embedding docs.openfn.org) is handled in a different issue.
Useful resources
Hello @josephjclark,
I hope you are doing great.
I have been working on this issue and encountered a problem where the embedding model is too large, causing a significant increase in the time required to build the Docker image. We can overcome this by sending an API request to the Hugging Face endpoint, which can use any model depending on our case. This approach will return the embeddings for our dataset and eliminate the need to include models like sentence-transformers in our dependencies (adding them to the -ft command isn't possible I guess as we will be needing the same model to embed the search queries).
I have tried using the sentence-transformers model in a local environment, and it works fine. However, the Docker build takes several minutes because dependencies like torch are also being installed during the build.
What are your thoughts on this approach?
I am almost done with setting up the embedding of a hardcoded corpus and adding it to a vector database cloud like Zilliz (cloud database for Milvus) during the Docker build. Only the search service needs a bit more work, and it will be completed in a couple of days.
Best regards