PygmalionAI/aphrodite-engine

[Performance]: Memory Usage Fix for gguf.

Closed this issue · 3 comments

Proposal to improve performance

Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.

Report of performance regression

Is there any way to first convert the gguf model then start ray instance??

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python env.py`

also how can i manually convert it?

Please read the doc here

okay thank you very much.