exl2

Question

exl2

eramax opened this issue a year ago · 2 comments

using exl2 2.4 you can run mixtral on colab, did you give it a try ?

Answer 1 · 2023-12-30T11:45:31.000Z

Hey! We are currently looking into other quantization approaches, both to improve inference speed and LM quality. How good is exl2's 2.4 quantization? 2.4 bits per parameters sounds like it reduces perplexity quite a bit. Could you provide any links, so we can look into it?

Answer 2 · 2023-12-30T12:37:58.000Z

@dvmazurm I made this example for you https://gist.github.com/eramax/b6fc0b472372037648df7f0019ab0e78
one note is colab T4 with 15 GB Vram is not enough for the context of Mixtral-8x7B if it was 16 GB it will work fine, since we need some vram for the context beside the model and the 2.4 model get loaded in about 14.7 GB.