🏠 Fully Client-Side Chat Over Documents

Yes, it's another chat over documents implementation... but this one is entirely local!

It's a Next.js app that read the content of an uploaded PDF, chunks it, adds it to a vector store, and performs RAG, all client side. You can even turn off your WiFi after the site loads!

You can see a live version at https://webml-demo.vercel.app.

Users will need to download and set up Ollama, then run the following commands to allow the site access to a locally running Mistral instance:

Mac/Linux

$ OLLAMA_ORIGINS=https://webml-demo.vercel.app OLLAMA_HOST=127.0.0.1:11435 ollama serve

Then, in another terminal window:

$ OLLAMA_HOST=127.0.0.1:11435 ollama pull mistral

Windows

$ set OLLAMA_ORIGINS=https://webml-demo.vercel.app
set OLLAMA_HOST=127.0.0.1:11435
ollama serve

Then, in another terminal window:

$ set OLLAMA_HOST=127.0.0.1:11435
ollama pull mistral

⚡ Stack

It uses the following:

Voy as the vector store, fully WASM in the browser.
Ollama to run an LLM locally and expose it to the web app.
LangChain.js to call the models, perform retrieval, and generally orchestrate all the pieces.
Transformers.js to run open source Nomic embeddings in the browser.
- For more speed on some machines, switch to "Xenova/all-MiniLM-L6-v2" in app/worker.ts.

I wanted to run as much of the app as possible directly in the browser, but you can swap in Ollama embeddings as well.

🔱 Forking

To run/deploy this yourself, simply fork this repo and install the required dependencies with yarn.

There are no required environment variables!

📖 Further reading

For a bit more on this topic, check out my blog post on Ollama or my Google Summit talk on building with LLMs in the browser.

🙏 Thank you!

Special thanks to @dawchihliou for making Voy, @jmorgan and @mchiang0610 for making Ollama and for your feedback, and @xenovacom for making Transformers.js.

For more, follow me on Twitter @Hacubu!