GPUs
fakerybakery opened this issue · 1 comments
fakerybakery commented
Hi,
Great repo! You mentioned you need quite a few A100s. If this model is ~50B parameters and ppl can run Llama 2 70B on 1xA100, why does this take so much compute?
Thank you!
vikhyat commented
I've never tried Llama 70B, but this is running in fp16 without any quantization. That might be part of it?