T4 GPUs support? Any recommendation
ggkngktrk opened this issue · 1 comments
ggkngktrk commented
Hello,
First and foremost, thank you for the commendation on our work and paper. I've been attempting to run Evo locally on T4 GPUs, but I encountered an issue with FlashAttn 2.0 not being supported yet. I have a few questions regarding this:
Do you have any plans to support T4 GPUs in the near future?
Will a single 16GB T4 GPU be sufficient for inference? If not, can we implement some optimization processes (with deepspeed) for Hugging Face models?
Is there a way to use FlashAttn 1.x versions, or can we disable Flash-Attn usage directly?
Is it possible to use float16 rather than bfloat16?
Thank you,
exnx commented
Unfortunately, there are no TPU implementations for Flash Attention that we're aware of :(