Using different models like Phi-3

Question

Using different models like Phi-3

CyberTimon opened this issue 5 months ago · 7 comments

Hello!

When using smaller models (but with bigger context windows) like Phi-3 result in a worse compression ratio?
Have you already tested this?

Thanks

Answer 1 · 2024-06-08T12:45:08.000Z

Short update, I've tried Phi-3-Mini, it memory leaked and completely crashed my M1 Max 64GB MacBook Pro. Investigating it further.

Answer 2 · 2024-06-09T03:16:42.000Z

Just tried Phi-3-Mini-128k and the same thing happened to me (same machine). I'm looking into it now as well.

Answer 3 · 2024-06-10T20:19:11.000Z

Also wondering in parallel if a big model would give significantly stronger compression ratios.

Answer 4 · 2024-06-11T04:04:32.000Z

As an update on this, it appears the crashing on the M1 Max 64GB results from a combination of these factors:

Offloading too many of the model's layers to the GPU
Running the model with too large of a context
Having mlock enabled

I've now pushed an update that disables mlock by default, but even with it disabled, when running Phi-3-Mini with 128k context and all layers offloaded to the GPU, it still runs very slowly and in fact nondeterministically on my machine, which renders it unsuitable for compression purposes. Disabling GPU offloading does solve this, but I've also now added an option to specify a different (smaller) context length as an alternative solution.

Answer 5 · 2024-06-11T04:43:00.000Z

As for the compression ratio of a smaller model with a longer context, stay tuned. I plan to add Phi-3 to the table soon. I'm not sure if I will be able to test any larger models though, as it would take an unreasonable amount of time to run them through my benchmark on my machine.

Answer 6 · 2024-06-12T02:02:45.000Z

Phi-3 is now added! Looks like it significantly outperforms Llama 3 on compressing code, and does very well on non-code as well.

Answer 7 · 2024-06-12T08:19:27.000Z

Great to hear. Will test it out. Thank you!