DocVec - Wasm meets Semantic search

I wanted an excuse see what all the hype about WebGPU and WebAssembly was all about for a long time. Then I attended a Rust Wasm meetup and was eager to find a project to learn about these technologies.

docVec is a client-side fully working semantic search engine, ie. having the model run ENTIRELY on the client machine. This is NOT a production-ready project.

My goals for the project were to:

Use Rust for NN inference
Use the GPU for model inference and see how mature it is to use wgpu: Luckily, I found the amazing project wonnx. I had to hack around some issues of running transformers and also implement some missing ONNX operators (cf. PR) for this to work. Also, I am still working on re-implementing the project's MatMul broadcasting and trying if possible to improve the compute shader performance.
Implement the whole logic in a webassembly module in Rust. The goal here is to understand some internals of wasm and the limitations that come from that
Keep the JS to a minimum.
Don't overcomplicate the search engine. For now a simple index of flat vector suffice.

Maintainer

Download gte-small model from huggingface

cd model/
git clone https://huggingface.co/Supabase/gte-small

Install onnx simplifier : onnxsim

Simplify model and fix input batch size and sequence length

python -m onnxsim gte-small/onnx/model.onnx  gte-small/onnx/sim_model.onnx \
 --overwrite-input-shape "input_ids:1,512" "attention_mask:1,512" "token_type_ids:1,512"

Install wasm-pack
```
cargo install wasm-pack
```

Clone modified version of wonnx (temporary)

cd ..
git clone https://github.com/AmineDiro/wonnx.git
git checkout broadcast-matmul

Build web assembly module & serve the page

cd ..  # go to project root
./build.sh && python3 -m http.server 8000

Now you can access the semantic search module on http://localhost:8000 🌟

TODO:

Backend (wasm):
- Project scaffolding using wasm-bindgen
- Generate string embedding using wonnx and gte-small model:
  - Add Erf operator to wonnx
  - Modify MatMul broadcasting checks ( this is temporary)
  - Reimplement correct MatMul with broadcasting
  - Investigate float NaN issues on Vulkan backend for wgpu
- Tokenize input in wasm tokenizers
- Build index :
  - Split page text
  - Embed text using sentence-transformers
  - Load index in wasm module
- Implement L2 distance and return k nearest neighbors (avec Vec<String>)
Frontend:
- Download example wiki page as simple html
- Loop over page elements and search for matching html element
- Highlight just the text and a littlebit the surrounding

akashicMarga/docvec

DocVec - Wasm meets Semantic search

Maintainer

TODO: