Is CPU only supported?
AlexanderYW opened this issue · 2 comments
Hi,
I'm trying to run the project on my server that only have a CPU, is that possible and if so which parameters do I need to apply?
I'm already running the container without the "--gpus all" parameter
I believe I'm running version 1.6.0
Here is the error i'm getting
{
"$error": {
"code": "APP_INFERENCE_ERROR",
"name": "ValueError",
"message": "torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU ",
"stack": "Traceback (most recent call last):\n File \"/api/server.py\", line 53, in inference\n output = await user_src.inference(all_inputs, streaming_response)\n File \"/api/app.py\", line 442, in inference\n pipeline.enable_xformers_memory_efficient_attention() # default on\n File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1453, in enable_xformers_memory_efficient_attention\n self.set_use_memory_efficient_attention_xformers(True, attention_op)\n File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1479, in set_use_memory_efficient_attention_xformers\n fn_recursive_set_mem_eff(module)\n File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1469, in fn_recursive_set_mem_eff\n module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 227, in set_use_memory_efficient_attention_xformers\n fn_recursive_set_mem_eff(module)\n File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n fn_recursive_set_mem_eff(child)\n File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n fn_recursive_set_mem_eff(child)\n File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n fn_recursive_set_mem_eff(child)\n File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 220, in fn_recursive_set_mem_eff\n module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n File \"/api/diffusers/src/diffusers/models/attention_processor.py\", line 200, in set_use_memory_efficient_attention_xformers\n raise ValueError(\nValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU \n"
}
}
Hey @AlexanderYW
Unfortunately this isn't a supported use case. Although, in theory, its possible. Definitely read any notes on the upstream diffuser project repository, as I don't have any knowledge or experience here.
Regarding the above error, you could bypass it in a few ways:
- Pass
{ callInputs: { xformers_memory_efficient_attention": false } }
, this is hopefully enough - Comment out all the
enable_xformers_memory_efficient_attention()
lines. - Use the latest
:dev
release / branch, which uses PyTorch 2 and doesn't need xformers anymore.
Hope this is the only blocker, and I'll be interested to hear about your experiences. Definitely open to PRs to document and improve CPU-only support.
Thanks!
Hey @gadicc, after writing this issue I actually saw the code changes that had been added lately and saw that :dev release had changed in terms of xformers and it actually worked.
But thanks for you response didn't know about the callInputs key :)