Yes, it's another chat over documents implementation... but this one is entirely local!
It's a Next.js app that read the content of an uploaded PDF, chunks it, adds it to a vector store, and performs RAG, all client side. You can even turn off your WiFi after the site loads!
You can see a live version at https://webml-demo.vercel.app.
Users will need to download and set up Ollama, then run the following commands to allow the site access to a locally running Mistral instance:
$ OLLAMA_ORIGINS=https://webml-demo.vercel.app OLLAMA_HOST=127.0.0.1:11435 ollama serve
Then, in another terminal window:
$ OLLAMA_HOST=127.0.0.1:11435 ollama pull mistral
$ set OLLAMA_ORIGINS=https://webml-demo.vercel.app
set OLLAMA_HOST=127.0.0.1:11435
ollama serve
Then, in another terminal window:
$ set OLLAMA_HOST=127.0.0.1:11435
ollama pull mistral
It uses the following:
- Voy as the vector store, fully WASM in the browser.
- Ollama to run an LLM locally and expose it to the web app.
- LangChain.js to call the models, perform retrieval, and generally orchestrate all the pieces.
- Transformers.js to run open source Nomic embeddings in the browser.
- For more speed on some machines, switch to
"Xenova/all-MiniLM-L6-v2"
inapp/worker.ts
.
- For more speed on some machines, switch to
I wanted to run as much of the app as possible directly in the browser, but you can swap in Ollama embeddings as well.
To run/deploy this yourself, simply fork this repo and install the required dependencies with yarn
.
There are no required environment variables!
For a bit more on this topic, check out my blog post on Ollama or my Google Summit talk on building with LLMs in the browser.
Special thanks to @dawchihliou for making Voy, @jmorgan and @mchiang0610 for making Ollama and for your feedback, and @xenovacom for making Transformers.js.
For more, follow me on Twitter @Hacubu!