/wasm-ai

Vercel and web-llm template to run wasm models directly in the browser.

Primary LanguageTypeScriptApache License 2.0Apache-2.0

WASM AI

Everything you need to run llms natively in the browser
and look good doing it.

Twitter FollowLicense

Live DemoKey FeaturesOne Click DeployUsage

wasmaidemo.mp4

WASM AI is a quickstart template to run large language models completely in the browser. Modern 7B LLMs (even quantized to q4) are incredibly intelligent - good enough for text-to-SQL search, creative writing, analysis, NLP and other tasks - or to be a friend on an airplane. You can now run them in the browser for complete privacy, at blazing inference speeds, without a cent of cloud costs.

WASM AI puts together work from far more talented people (like the folks at MLC LLM, who built the library to compile huggingface models into other formats, and Vercel, who made Vercel AI and the chatbot template).

Key Features

This repo is meant to be a quickstart to build and iterate on local, open-source models in the browser, even distribute them as part of larger apps. We have a few things here that might be useful:

  • Two smart, compiled models

    • Dolphin 2.2.1 and OpenHermes-2.5 are provided as compiled wasm-compatible models to test. I can compile other models on request, when I get the time.
  • Swap between local and cloud easily

    • I kept things as compatible as I could with Vercel's AI library, which has useful things like backpressure and streaming. You can swap them by changing these two constants. That's it. This should make testing and validation easier for your apps.
  • Web workers

    • took some figuring out, but the local model and inference sits inside a worker, so the UI can run smoother.
  • UI Bells and whistles

    • live code and markdown formatting, scroll to bottom, etc. I got most of these from the chatbot template, but I've cleaned out everything else and done a fresh migration.
  • Local Transcription with Whisper

    • In the spirit of doing everything on the browser, Whisper-turbo is now integrated, to do voice chat directly in the browser. If you'd like just the base chat things, pull the just-chat branch.

This repo is the work of one overworked dev, and meant to be for educational purposes. Use at your own risk!

For other projects, check out wishful search!, or say hi on Twitter!

One-click deploy

Deploy your own to Vercel with a single click:

Deploy with Vercel

Usage

Clone the repo. Then:

yarn
yarn dev

That's it!

Not done yet

  1. Error handling - Sometimes things fail. I haven't handled those times yet. For all the other times, there's Masterca-
  2. More support - There's a crypto.randomUUID issue on mobile even on WebGPU-enabled Chrome. I'm torn between patching the web-llm package or asking them to help.