kiri-art/docker-diffusers-api

Is CPU only supported?

AlexanderYW opened this issue · 2 comments

Hi,

I'm trying to run the project on my server that only have a CPU, is that possible and if so which parameters do I need to apply?

I'm already running the container without the "--gpus all" parameter

I believe I'm running version 1.6.0

Here is the error i'm getting

{
    "$error": {
        "code": "APP_INFERENCE_ERROR",
        "name": "ValueError",
        "message": "torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU ",
        "stack": "Traceback (most recent call last):\n  File \"/api/server.py\", line 53, in inference\n    output = await user_src.inference(all_inputs, streaming_response)\n  File \"/api/app.py\", line 442, in inference\n    pipeline.enable_xformers_memory_efficient_attention()  # default on\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1453, in enable_xformers_memory_efficient_attention\n    self.set_use_memory_efficient_attention_xformers(True, attention_op)\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1479, in set_use_memory_efficient_attention_xformers\n    fn_recursive_set_mem_eff(module)\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1469, in fn_recursive_set_mem_eff\n    module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 227, in set_use_memory_efficient_attention_xformers\n    fn_recursive_set_mem_eff(module)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 220, in fn_recursive_set_mem_eff\n    module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n  File \"/api/diffusers/src/diffusers/models/attention_processor.py\", line 200, in set_use_memory_efficient_attention_xformers\n    raise ValueError(\nValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU \n"
    }
}
gadicc commented

Hey @AlexanderYW

Unfortunately this isn't a supported use case. Although, in theory, its possible. Definitely read any notes on the upstream diffuser project repository, as I don't have any knowledge or experience here.

Regarding the above error, you could bypass it in a few ways:

  1. Pass { callInputs: { xformers_memory_efficient_attention": false } }, this is hopefully enough
  2. Comment out all the enable_xformers_memory_efficient_attention() lines.
  3. Use the latest :dev release / branch, which uses PyTorch 2 and doesn't need xformers anymore.

Hope this is the only blocker, and I'll be interested to hear about your experiences. Definitely open to PRs to document and improve CPU-only support.

Thanks!

Hey @gadicc, after writing this issue I actually saw the code changes that had been added lately and saw that :dev release had changed in terms of xformers and it actually worked.

But thanks for you response didn't know about the callInputs key :)