GPUs

Question

GPUs

fakerybakery opened this issue a year ago · 1 comments

Hi,
Great repo! You mentioned you need quite a few A100s. If this model is ~50B parameters and ppl can run Llama 2 70B on 1xA100, why does this take so much compute?
Thank you!

Answer 1 · 2023-12-09T02:06:09.000Z

I've never tried Llama 70B, but this is running in fp16 without any quantization. That might be part of it?