This repo contains an example of a conversational retrieval system that can route between multiple data sources, choosing the one more relevant to an incoming question. This helps cut down on distraction from off-topic documents getting pulled in by the vectorstore's similarity search, which could occur if only a single database were used, and is particularly important for small models.
🚀 Live version here: https://langchain-cloudflare-nuxt-template.jacob-ee9.workers.dev/
The base version runs entirely on the Cloudflare WorkersAI stack with a tiny open-source Llama 2-7B model, but you can
swap in more powerful models such as Anthropic's claude-2
or OpenAI's gpt-3.5-turbo
to improve performance in key places if desired.
It uses:
- A chat variant of Llama 2-7B run via the Cloudflare WorkersAI network
- A Cloudflare WorkersAI embeddings model
- Two different Cloudflare Vectorize DBs for different knowledge bases (you could add more!)
- Cloudflare Workers for hosting
- LangChain.js for orchestration
- Nuxt + Vue for the frontend
Here's how it works:
- The vectorstores are populated with ingested data via a local ping to the Nuxt server route under
/server/api/ingest.ts
. - An incoming question is routed to the Nuxt server route under
/server/api/chat.post.ts
. This route constructs and calls a LangChain Expression Language chain located under/utils/conversational_retrieval_chain.ts
. - If there have been previous messages in the conversation, the chain first transforms the original question into a standalone question, free of pronouns and other references to chat history. This is important since vectorstores return results based on similarity between ingested docs and the query.
- Based on this rephrased query, the chain selects which vectorstore to retrieve from.
- The chain retrieves context docs based on the output of the previous step from the chosen vectorstore.
- The chain generates a final answer based on this retrieved context and the standalone question.
Here's an illustrative LangSmith trace of the steps involved.
Because we use a small model, removing as much distraction as possible via routing is even more helpful here.
The two default data sources populated in server/api/ingest.ts
via LangChain document loaders.
They are a a PDF detailing some of Cloudflare's features and a blog post by Lilian Weng that talks about autonomous agents.
If you want to use alternate data sources, make sure to remember to also update the routing steps in the chain!
$ npm install
Note that this currently requires you to be on a paid Cloudflare Workers plan.
We configure the databases to work with Cloudflare WorkersAI's @cf/baai/bge-base-en-v1.5
embeddings model.
$ npx wrangler vectorize create langchain_cloudflare_docs_index --preset @cf/baai/bge-base-en-v1.5
$ npx wrangler vectorize create langchain_ai_docs_index --preset @cf/baai/bge-base-en-v1.5
The names match those found in the default wrangler.toml
file. If you choose different names, you will need to update the bindings there.
Note: If you want to delete your databases, you can run the following commands:
$ npx wrangler vectorize delete langchain_cloudflare_docs_index
$ npx wrangler vectorize delete langchain_ai_docs_index
You can use other presets or parameters for other embedding models. See the Wrangler CLI reference docs for a full list of options.
Copy .dev.vars.example
to .dev.vars
and fill in the required variables:
# https://dash.cloudflare.com/
CLOUDFLARE_ACCOUNT_ID=
# https://developers.cloudflare.com/workers-ai/get-started/rest-api/
CLOUDFLARE_WORKERSAI_API_TOKEN=
# For tracing via LangSmith
# https://docs.smith.langchain.com/
LANGCHAIN_TRACING_V2=
LANGCHAIN_SESSION=
LANGCHAIN_API_KEY=
# If swapping in OpenAI somewhere, https://platform.openai.com/api-keys
OPENAI_API_KEY=
ENVIRONMENT="local"
$ npm run build
$ npm run preview
You can combine the previous two steps to reflect changes while developing (Nuxt hot reload coming soon!):
$ npm run reload
Ping http://localhost:3000/api/ingest
to populate the Vectorize DBs you set up earlier.
And finally, go to http://localhost:3000
and try asking a question about either Cloudflare or AI!
When you're ready, deploy to Cloudflare workers via Wrangler with:
$ npm run deploy
You'll need to set private encrypted environment variables in your Cloudflare console under your Worker's Settings -> Variables
page for CLOUDFLARE_WORKERSAI_API_TOKEN
, CLOUDFLARE_ACCOUNT_ID
, and LANGCHAIN_API_KEY
if using tracing:
By default, the only APIs and resources this app uses are within Cloudflare. However, by leveraging frontier models with higher reasoning capabilities at key points in the retrieval chain, we can drastically improve performance without using an excessive number of tokens.
The two places where this increased reasoning power are most helpful are in the question rephrase step and the routing step, and coincidentally, both steps are lighter on tokens because they do not involve passing a large list of retrieved docs as context.
If you set the following environment variables, this app will use Anthropic's Claude 2 hosted via AWS Bedrock for routing and follow-up question rephrasing:
BEDROCK_AWS_ACCESS_KEY_ID=
BEDROCK_AWS_SECRET_ACCESS_KEY=
BEDROCK_AWS_REGION=
To avoid distraction for the default Llama 2 model, we do not pass chat history into the final answer generation call. This makes it incapable of answering certain
meta-questions about the conversation such as "how many questions have I asked?". To properly answer those questions, swap in a more powerful model and
uncomment the corresponding line in utils/conversational_retrieval_chain.ts
.
Special thanks to Pooya Parsa for help with integrating Nuxt with Cloudflare's Wrangler.
For more, follow LangChain on X (formerly Twitter) @LangChainAI.