huggingface/tgi-gaudi

Best Performance for a single card for Llama-2-7b-chat-hf

AdityaKulshrestha opened this issue · 0 comments

Hi team,

We are currently working on getting the best performance on a single card for Gaudi. Following is the configuration we are looking for:

Target - Maximize currency without losing TTFT
Model - meta-llama/Llama-2-7b-chat-hf
Input Tokens - 1024
Output - 256
Acceptable TTFT - 2-2.5 sec

The best concurrency we could achieve was 16 on a single card. Can someone please help if this is the best possibility for this case scenario?