I wanted an excuse see what all the hype about WebGPU
and WebAssembly
was all about for a long time. Then I attended a Rust Wasm meetup and was eager to find a project to learn about these technologies.
docVec
is a client-side fully working semantic search engine, ie. having the model run ENTIRELY on the client machine. This is NOT a production-ready project.
My goals for the project were to:
- Use Rust for NN inference
- Use the GPU for model inference and see how mature it is to use wgpu: Luckily, I found the amazing project wonnx. I had to hack around some issues of running transformers and also implement some missing ONNX operators (cf. PR) for this to work. Also, I am still working on re-implementing the project's MatMul broadcasting and trying if possible to improve the compute shader performance.
- Implement the whole logic in a webassembly module in Rust. The goal here is to understand some internals of wasm and the limitations that come from that
- Keep the JS to a minimum.
- Don't overcomplicate the search engine. For now a simple index of flat vector suffice.
-
Download
gte-small
model from huggingfacecd model/ git clone https://huggingface.co/Supabase/gte-small
-
Install onnx simplifier :
onnxsim
-
Simplify model and fix input batch size and sequence length
python -m onnxsim gte-small/onnx/model.onnx gte-small/onnx/sim_model.onnx \ --overwrite-input-shape "input_ids:1,512" "attention_mask:1,512" "token_type_ids:1,512"
-
Install
wasm-pack
cargo install wasm-pack
-
Clone modified version of
wonnx
(temporary)cd .. git clone https://github.com/AmineDiro/wonnx.git git checkout broadcast-matmul
-
Build web assembly module & serve the page
cd .. # go to project root ./build.sh && python3 -m http.server 8000
Now you can access the semantic search module on http://localhost:8000
🌟
-
Backend (wasm):
- Project scaffolding using
wasm-bindgen
- Generate string embedding using
wonnx
andgte-small
model:- Add
Erf
operator to wonnx - Modify
MatMul
broadcasting checks ( this is temporary) - Reimplement correct
MatMul
with broadcasting - Investigate float NaN issues on Vulkan backend for wgpu
- Add
- Tokenize input in wasm
tokenizers
- Build index :
- Split page text
- Embed text using
sentence-transformers
- Load index in wasm module
- Implement L2 distance and return k nearest neighbors (avec
Vec<String>
)
- Project scaffolding using
-
Frontend:
- Download example wiki page as simple html
- Loop over page elements and search for matching html element
- Highlight just the text and a littlebit the surrounding