su77ungr/CASALIOY

New convert.py creating massive files ?

hippalectryon-0 opened this issue · 3 comments

Using the "old" convert.py (https://raw.githubusercontent.com/ggerganov/llama.cpp/master/convert.py):
ggml-model-q4_0.bin (4Gb) -> new.bin (4Gb), takes a few seconds

Using the "new" convert.py (the one in main):
ggml-model-q4_0.bin (4Gb) -> modelsnew.bin (26Gb !!!), takes a few minutes

What's going on ? ^^

Also this might be caused by a newer LlamaCpp unit. Get back to the old version until fixed.

Found the culprit @alxspiker :P

output_type = pick_output_type(model, "f32")

i'm not sure why we hardcoded "f32" but i think it should be None.

i'll open a PR soon.

Found the culprit @alxspiker :P

output_type = pick_output_type(model, "f32")

i'm not sure why we hardcoded "f32" but i think it should be None.

i'll open a PR soon.

Honestly just took an existing convert file and just change some things. Don't know much about models in general. I just know how to load and use the llama7b model.