Explore options for improving RAG
Opened this issue · 5 comments
The current RAG implementation isn't great. It is primarily keyword based and is agentic, however it does work with any LLM model.
The implementation could be significantly improved by;
- Allowing the LLM to generate search queries across user databases
- Having a proper vector database implementation
Explore using libraries such as:
After a lot of experiments with different RAG libraries, I decided to build an agent using langchain
and build a suite of tools the agent can use the answer prompts.
For now, I'm using the legacy Langchain Agent framework, not the new LangSmith framework as it appeared better supported and easier to use.
Sidenote: I've found langchain to be quite buggy. The documentation often doesn't match the actual code implementation. Many of the community libraries have issues.
Implemented tools
- Query tools for all user databases (emails, calendar, files etc.)
- Vector database tool (using tensorflow for local embedding generation)
- Chat thread query tool
This can easily be expanded in the future to add support for any other langchain built-in tools (ie: web search, wikipedia, web page extraction).
Results
Results so far are much better as the LLM can easily search user data and do keyword search using the vector database to get access to the most relevant information to answer a prompt.
It can now answer things like; What upcoming meetings do I have?
, How many emails I have I received from Ryan
, Review my sent emails to create a personality profile
, How often do I send things
.
The agent supports utilizing multiple tools in sequence, so it may request some data from the database and then do a vector database search to get more results (or vice versa). The agent is in complete control of this process, utilizing the available tools until it has enough information to answer the prompt.
LLM Model Notes
Using a langchain agent with tools, requires using a LLM model interface that supports tools. Currently this limits us to using Anthropic models as we are currently using AWS Bedrock and that's the only model that supports using tools. I have chosen Claude 3.5 Haiku
for now. I would prefer to be using Llama as it's open source and self-hosted via RedPill.
I have experimented with RedPill.ai (https://red-pill.ai/) which is fast and runs in TEE's, however their API interface doesn't appear to be compatible with langchain. I have asked them for more info on this.
Embeddings
I am using Tensforflow for arm64 that is run locally to generate embeddings. This is quite a slow process on my local Mac. Once we have a secure connection to RedPill via langchain, this can potentially be replaced to run remotely and ideally will provide faster performance as well.
The embeddings are currently stored using the CloseVector library and are cached on disk for increased performance.
It is worth considering encrypted these generated vector database indexes and storing somewhere for fast reloading.
Remaining tasks:
- Cleanup dependencies
- Check code refactor hasn't broken existing code, indexing and hot loading
- Clear cache process (delete in memory indexes, vector store and on-disk vector store)
- Expose new RAG via new endpoint
/agent
- Implement get records tool for agent to fetch specific records by ID
- Support metadata searching for vector database (timestamp, type, groupId)
- Cleanup LLM key management
- Provide output with information on what tools the agent is calling, latency, tokens used
- New verida-js protocol release with updated pouchdb find fix (also change signing security model)
- Implement
ChatThread
agent tool
Known issues:
- Token totals in agent response doesn't include the final LLM request (langchain doesn't seem to support this out of the box)
Future enhancements:
LLM update
I have been able to switch from AWS Bedrock to Together.ai and get that working with Llama3.1 in langchain.
Llama-3.1-70B-Instruct-Turbo
produces much worse results than usingClaude 3.5 Haiku
.Llama-3.2-3B-Instruct-Turbo
doesn't support JSON /function calling so can't be used as an agent with tools
Note: Many models don't support JSON / function.
This leaves a conundrum, the open source Llama model doesn't work as well as the proprietary models (ChatGPT, Claude). Perhaps we give the user a choice?
- Claude (better results, but data is via AWS bedrock and subject to their privacy policy)
- Llama (not as good results, but 100% private, running in TEE via Redpill)
Embeddings update
There is a performance bottleneck associated with creating embeddings and storing in the vector database. I also have been getting out of memory errors on my local Mac when running Tensor Flow embeddings locally.
I tried generating embeddings via AWS Bedrock, however the network latency for lots of small documents made that even slower, plus it costs money and sends more data to AWS infra which isn't ideal.
LLM Update
I am hitting issues if I directly ask Claude for my personal information due to it's privacy controls. Asking it to complete a JSON object with that information seems to work okay though, which is a relief.