GPU requirements for training and inference

Question

GPU requirements for training and inference

Closed this issue 9 months ago · 1 comments

Hi, it would be very helpful if the GPU-related info can be added to the documentation so that we know if we have enough VRAM needed for training or inference. Thanks!

Answer 1 · 2023-12-20T02:09:19.000Z

We've trained and performed inference on a 40G V100, using 16-mixed precision for training and float32 for inference. During training, LLaMA didn't utilize float32 parameters for import, allowing us to reach a batch size of 16. For inference, we've adopted a strategy of processing one audio piece at a time. To process the entire test set, which consists of 600 sentences, it took us 10 minutes on eight 40G V100s. If you're working with a different GPU, you might want to experiment with varying the batch size and trying out different inference methods.