Janus-1.3B-ONNX - Can't create a session. Failed to allocate a buffer
Opened this issue · 6 comments
System Info
Transfomer.js version: "@huggingface/transformers": "^3.1.0"
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
I was trying the code provided in the onnx-community/Janus-1.3B-ONNX
repository, but I encountered the following error:
ort.webgpu.bundle.min.mjs:2603 Uncaught Error: Can't create a session. Failed to allocate a buffer of
size 2079238052.
at jt (ort.webgpu.bundle.min.mjs:2603:25061)
at Pr (ort.webgpu.bundle.min.mjs:2603:25240)
at Kl (ort.webgpu.bundle.min.mjs:2603:34605)
at mn.loadModel (ort.webgpu.bundle.min.mjs:2603:36389)
at fn.createInferenceSessionHandler (ort.webgpu.bundle.min.mjs:2603:38145)
at e.create (ort.webgpu.bundle.min.mjs:6:19471)
at async createInferenceSession (onnx.js:163:1)
at async models.js:301:1
at async Promise.all (:5173/index 0)
at async constructSessions (models.js:298:1)
I believe that 2079238052 bytes (approximately 1.94 GB) is less than 2 GB, so it shouldn't be causing this issue. Additionally, I noticed that the file preprocessor_config.json
is being downloaded or loaded twice.
Reproduction
import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";
const model_id = "onnx-community/Janus-1.3B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);
Can you try using WebGPU? Also, I recommend the following dtypes, depending on whether fp16 is supported or not.
const model_id = "onnx-community/Janus-1.3B-ONNX";
const fp16_supported = true; // do feature check
const model = await MultiModalityCausalLM.from_pretrained(model_id, {
dtype: fp16_supported
? {
prepare_inputs_embeds: "q4",
language_model: "q4f16",
lm_head: "fp16",
gen_head: "fp16",
gen_img_embeds: "fp16",
image_decode: "fp32",
}
: {
prepare_inputs_embeds: "fp32",
language_model: "q4",
lm_head: "fp32",
gen_head: "fp32",
gen_img_embeds: "fp32",
image_decode: "fp32",
},
device: {
prepare_inputs_embeds: "wasm", // TODO use "webgpu" when bug is fixed
language_model: "webgpu",
lm_head: "webgpu",
gen_head: "webgpu",
gen_img_embeds: "webgpu",
image_decode: "webgpu",
},
})
It's working now, thanks, @xenova! Much appreciated.
I will wait for the bug to be fixed.
Is it possible to pass the following parameters to the generate_images
function: width
, height
, cfg_weight
, and parallel_size
?
Do you have an example for how to do this in the python library? My understanding is that the model generates exactly 576 tokens, so unless the image decoder can produce higher quality images, it can currently only generate 384x384 images.
My bad, you're right—it's 384x384 after checking their docs. But what about guidance
and parallel_size
?
You should be able to pass e.g., guidance_scale: 4to the
generate_images` function. Batched generation technically works, but needs a bit more experimentation.