Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size)

Question

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size)

Harini-Vemula-2382 opened this issue 8 months ago · 2 comments

Harini-Vemula-2382 commented 8 months ago

Describe the bug
When attempting to run the optimization process using the llm.py script, encountering an error not enough memory even after setting paging file size file to maximum.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
Please update or give me resolution to solve this issue.

Olive config
Add Olive configurations here.

Olive logs
Add logs here.

Other information

OS: [e.g. Windows, Linux]
Olive version: [e.g. 0.4.0 or main]
ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.16.1]

Additional context
Please help to convert and execute LLava on DirectML.

Answer 1 · 2024-05-09T17:54:53.000Z

@PatriceVignola could you look at this? Thanks!

Answer 2 · 2024-05-09T19:39:13.000Z

Hi @Harini-Vemula-2382,

This error comes from DirectML itself, which indicates that the GPU doesn't have enough VRAM to load the model.