ngxson/wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference

C++MIT

Pinned issues

performance expectations

#4 opened 5 months ago by chadkirby

Open5

PostMessage: Data cannot be cloned, out of memory

#12 opened 4 months ago by flatsiedatsie

Open23

How would you implement RAG / Document chat?

#36 opened 4 months ago by flatsiedatsie

Closed5

Issues

What does `noTEE` do?
#107 opened a month ago by flatsiedatsie
0
error loading model hyperparameters
#106 opened a month ago by flatsiedatsie
3
Unreachable
#62 opened 3 months ago by flatsiedatsie
4
[Feature request] LoRA support
#105 opened a month ago by OKUA1
0
implement KV cache reuse for completion
#101 opened 2 months ago by ngxson
0
main: initialize main example
#96 opened 2 months ago by ngxson
2
Add prettier
#98 opened 2 months ago by ngxson
0
Add support for `AbortController` on downloading model
#83 opened 3 months ago by flatsiedatsie
1
ci: add e2e test
#97 opened 2 months ago by ngxson
0
T5 and Flan-T5 models support (llama_encode)
#86 opened 2 months ago by felladrin
1
Model caching with new download manager?
#87 opened 2 months ago by flatsiedatsie
1
Add support for control vectors
#89 opened 2 months ago by ngxson
0
BitNet support
#69 opened 2 months ago by flatsiedatsie
17
Should all models now be chunked?
#20 opened 2 months ago by flatsiedatsie
3
The mystery of Schrodinger's exit function
#82 opened 3 months ago by flatsiedatsie
2
Failed to build from scratch: llamacpp-wasm-builder, CMake Error (add_executable): Cannot find source file
#76 opened 3 months ago by flatsiedatsie
1
Large models fail to load from cache on iOS browsers, but load and run fine when uncached
#72 opened 3 months ago by felladrin
5
Feature request: Github build workflow
#6 opened 5 months ago by flatsiedatsie
1
Add WebGPU support
#66 opened 3 months ago by ngxson
0
PostMessage: Data cannot be cloned, out of memory
#12 opened 4 months ago by flatsiedatsie
23
After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models
#31 opened 4 months ago by felladrin
6
Feature request: diversify error mesages when loading a model fails
#56 opened 4 months ago by flatsiedatsie
1
unlimited token limit in demo
#71 opened 3 months ago by fabriziosalmi
2
Glitch remixable no-build example
#70 opened 3 months ago by Utopiah
2
[Idea] Use OPFS for storing downloaded files
#38 opened 3 months ago by ngxson
1
Would it be possible to release a new version?
#67 opened 3 months ago by flatsiedatsie
5
Error when loading a model via relative path
#63 opened 3 months ago by felladrin
3
Made a function to build the Model URL Array when detecting the url has the gguf-split pattern `-<number>-of-<number>.gguf`. Would it fit in the lib?
#58 opened 3 months ago by felladrin
2
404-ed model still ended up in wllama_cache?
#57 opened 4 months ago by flatsiedatsie
3
[Idea] Load model from File Blob
#42 opened 4 months ago by ngxson
0
[Idea] Stream data from main thread to worker
#43 opened 4 months ago by ngxson
1
How would you implement RAG / Document chat?
#36 opened 4 months ago by flatsiedatsie
5
Post on Reddit/r/LocalLlama?
#53 opened 4 months ago by flatsiedatsie
8
[Idea] Publish to JSR
#55 opened 4 months ago by ngxson
0
Error when running `h2o-danube2-1.8b-chat` and `phi-2` models when `cache_type_k` is set to `q4_0` or `q8_0`
#54 opened 4 months ago by felladrin
3
warning: munmap failed: Invalid argument
#7 opened 4 months ago by flatsiedatsie
6
Feature request: Add an option for Verbose Mode, for opt-in console logs
#17 opened 4 months ago by felladrin
1
Feature request: get the debug output through a callback?
#40 opened 4 months ago by flatsiedatsie
1
Seeing <|end|> in output
#45 opened 4 months ago by flatsiedatsie
4
performance expectations
#4 opened 5 months ago by chadkirby
5
missing pre-tokenizer type
#41 opened 4 months ago by flatsiedatsie
11
[Idea] Use something better than memfs
#35 opened 4 months ago by ngxson
2
Wllama doesn't load the provided chunks
#44 opened 4 months ago by flatsiedatsie
2
Bug: exception handling is broken
#22 opened 4 months ago by ngxson
0
Feature request: add a built-in way to interrupt inference
#19 opened 4 months ago by flatsiedatsie
0
Out Of Memory error in Wllama with multi-threads on iOS browser
#18 opened 4 months ago by felladrin
4
The current configuration of Emscripten with `PTHREAD_POOL_SIZE=32` for multi-threading may be causing memory wastage
#16 opened 4 months ago by felladrin
3
qwen returns empty string
#11 opened 4 months ago by flatsiedatsie
4
null function or function signature mismatch at wllama.wasm
#10 opened 4 months ago by flatsiedatsie
5
Support for local webpage use?
#5 opened 5 months ago by twoxfh
2