microsoft/Olive

Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size)

Harini-Vemula-2382 opened this issue · 2 comments

Describe the bug
When attempting to run the optimization process using the llm.py script, encountering an error not enough memory even after setting paging file size file to maximum.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
Please update or give me resolution to solve this issue.

Olive config
Add Olive configurations here.

Olive logs
Add logs here.

Other information

  • OS: [e.g. Windows, Linux]
  • Olive version: [e.g. 0.4.0 or main]
  • ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.16.1]

Additional context
Please help to convert and execute LLava on DirectML.
llava_1
llava_2
Memory_Error

@PatriceVignola could you look at this? Thanks!

Hi @Harini-Vemula-2382,

This error comes from DirectML itself, which indicates that the GPU doesn't have enough VRAM to load the model.