Best Performance for a single card for Llama-2-7b-chat-hf
AdityaKulshrestha opened this issue · 0 comments
AdityaKulshrestha commented
Hi team,
We are currently working on getting the best performance on a single card for Gaudi. Following is the configuration we are looking for:
Target - Maximize currency without losing TTFT
Model - meta-llama/Llama-2-7b-chat-hf
Input Tokens - 1024
Output - 256
Acceptable TTFT - 2-2.5 sec
The best concurrency we could achieve was 16 on a single card. Can someone please help if this is the best possibility for this case scenario?