T4 GPUs support? Any recommendation

Question

T4 GPUs support? Any recommendation

ggkngktrk opened this issue 7 months ago · 1 comments

Hello,

First and foremost, thank you for the commendation on our work and paper. I've been attempting to run Evo locally on T4 GPUs, but I encountered an issue with FlashAttn 2.0 not being supported yet. I have a few questions regarding this:

Do you have any plans to support T4 GPUs in the near future?
Will a single 16GB T4 GPU be sufficient for inference? If not, can we implement some optimization processes (with deepspeed) for Hugging Face models?
Is there a way to use FlashAttn 1.x versions, or can we disable Flash-Attn usage directly?
Is it possible to use float16 rather than bfloat16?

Thank you,

Answer 1 · 2024-04-22T05:44:24.000Z

Unfortunately, there are no TPU implementations for Flash Attention that we're aware of :(