[Performance]: Memory Usage Fix for gguf.

Question

[Performance]: Memory Usage Fix for gguf.

Closed this issue 2 months ago · 3 comments

Proposal to improve performance

Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.

Report of performance regression

Is there any way to first convert the gguf model then start ray instance??

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python env.py`

Answer 1 · 2024-04-30T13:31:04.000Z

also how can i manually convert it?

Answer 2 · 2024-04-30T13:32:09.000Z

Please read the doc here

Answer 3 · 2024-04-30T13:34:44.000Z

okay thank you very much.