[Performance]: Memory Usage Fix for gguf.
Closed this issue · 3 comments
Abulhanan commented
Proposal to improve performance
Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.
Report of performance regression
Is there any way to first convert the gguf model then start ray instance??
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
The output of `python env.py`
Abulhanan commented
also how can i manually convert it?
Abulhanan commented
okay thank you very much.