Llava-7b model Conversion to ONNX and Latency Optimization - OOM error (even after setting paging file size)
Harini-Vemula-2382 opened this issue · 2 comments
Harini-Vemula-2382 commented
Describe the bug
When attempting to run the optimization process using the llm.py script, encountering an error not enough memory even after setting paging file size file to maximum.
To Reproduce
Steps to reproduce the behavior.
Expected behavior
Please update or give me resolution to solve this issue.
Olive config
Add Olive configurations here.
Olive logs
Add logs here.
Other information
- OS: [e.g. Windows, Linux]
- Olive version: [e.g. 0.4.0 or main]
- ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.16.1]
Additional context
Please help to convert and execute LLava on DirectML.
jambayk commented
@PatriceVignola could you look at this? Thanks!
PatriceVignola commented
This error comes from DirectML itself, which indicates that the GPU doesn't have enough VRAM to load the model.