Apple M1 / M2 / MPS support

Question

Apple M1 / M2 / MPS support

ormedo opened this issue 3 years ago · 13 comments

Hi!

I just downladed de proyect and try to build and deploy the docker on my M1.
I always get the same error.
[+] Building 163.5s (15/44)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 6.90kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s
[+] Building 163.6s (15/44)
=> => transferring context: 64.22kB 0.0s
=> CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s
=> [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s
=> [base 3/5] RUN apt-get update 14.3s
=> [base 4/5] RUN apt-get install -yqq git 27.6s
[+] Building 1320.8s (18/44)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 6.90kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s
=> [internal] load build context 0.0s
=> => transferring context: 64.22kB 0.0s
=> CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s
=> [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s
=> [base 3/5] RUN apt-get update 14.3s
=> [base 4/5] RUN apt-get install -yqq git 27.6s
=> [base 5/5] RUN apt-get install -yqq zstd 8.3s
=> [output 1/32] RUN mkdir /api 0.5s
=> [patchmatch 1/3] WORKDIR /tmp 0.0s
=> [patchmatch 2/3] COPY scripts/patchmatch-setup.sh . 0.0s
=> [patchmatch 3/3] RUN sh patchmatch-setup.sh 0.4s
=> [output 2/32] WORKDIR /api 0.0s
=> [output 3/32] RUN conda update -n base -c defaults conda 101.1s
=> [output 4/32] RUN conda create -n xformers python=3.10 33.9s
=> [output 5/32] RUN python --version 6.3s
=> ERROR [output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1 1126.9s

[output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1:
#14 9.049 Collecting package metadata (current_repodata.json): ...working... done
#14 85.41 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#14 85.44 Collecting package metadata (repodata.json): ...working... done
#14 489.9 Solving environment: ...working... done
#14 619.6
#14 619.6 ## Package Plan ##
#14 619.6
#14 619.6 environment location: /opt/conda/envs/xformers
#14 619.6
#14 619.6 added / updated specs:
#14 619.6 - cudatoolkit=11.6
#14 619.6 - pytorch=1.12.1
#14 619.6
#14 619.6
#14 619.6 The following packages will be downloaded:
#14 619.6
#14 619.6 package | build
#14 619.6 ---------------------------|-----------------
#14 619.6 blas-1.0 | mkl 6 KB
#14 619.6 ca-certificates-2022.12.7 | ha878542_0 143 KB conda-forge
#14 619.6 certifi-2022.12.7 | pyhd8ed1ab_0 147 KB conda-forge
#14 619.6 cudatoolkit-11.6.0 | hecad31d_10 821.2 MB conda-forge
#14 619.6 intel-openmp-2022.1.0 | h9e868ea_3769 4.5 MB
#14 619.6 mkl-2022.1.0 | hc2b9512_224 129.7 MB
#14 619.6 pytorch-1.12.1 |py3.10_cuda11.6_cudnn8.3.2_0 1.20 GB pytorch
#14 619.6 pytorch-mutex-1.0 | cuda 3 KB pytorch
#14 619.6 typing_extensions-4.4.0 | pyha770c72_0 29 KB conda-forge
#14 619.6 ------------------------------------------------------------
#14 619.6 Total: 2.13 GB
#14 619.6
#14 619.6 The following NEW packages will be INSTALLED:
#14 619.6
#14 619.6 blas pkgs/main/linux-64::blas-1.0-mkl
#14 619.6 cudatoolkit conda-forge/linux-64::cudatoolkit-11.6.0-hecad31d_10
#14 619.6 intel-openmp pkgs/main/linux-64::intel-openmp-2022.1.0-h9e868ea_3769
#14 619.6 mkl pkgs/main/linux-64::mkl-2022.1.0-hc2b9512_224
#14 619.6 pytorch pytorch/linux-64::pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0
#14 619.6 pytorch-mutex pytorch/noarch::pytorch-mutex-1.0-cuda
#14 619.6 typing_extensions conda-forge/noarch::typing_extensions-4.4.0-pyha770c72_0
#14 619.6
#14 619.6 The following packages will be UPDATED:
#14 619.6
#14 619.6 ca-certificates pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-ha878542_0
#14 619.6 certifi pkgs/main/linux-64::certifi-2022.9.24~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0
#14 619.6
#14 619.6
#14 619.6 Proceed ([y]/n)?
#14 619.6
#14 619.6 Downloading and Extracting Packages

#14 1110.5 CondaError: Downloaded bytes did not match Content-Length
#14 1110.5 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2
#14 1110.5 target_path: /opt/conda/pkgs/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2
#14 1110.5 Content-Length: 1284916176
#14 1110.5 downloaded bytes: 1100035059
#14 1110.5
#14 1110.5
#14 1110.5
#14 1126.1 ERROR conda.cli.main_run:execute(47): `conda run /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1` failed. (See above for error)

executor failed running [/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1]: exit code: 1

I understand that it's a download problem, but I'm not good at docker to be able to fix this problem.

Any suggestions?

Answer 1 · 2022-12-09T12:23:47.000Z

Hey!

I would have said just rerun the build command and it will retry again from the last successful (and cached) step, but as you say, you keep getting the same error.

Is it always on the same file, and the same number of bytes?
Are you using a proxy?

Answer 2 · 2022-12-09T12:48:47.000Z

HI!

Thanks your your time supporting us.
I think cloud be a bad file or networking issue.
Upgrade to 11.7 version fix the problem

Answer 3 · 2022-12-09T13:26:04.000Z

Hi! Just another conflict.

In this case with the python version.

#15 2074.9
#15 2074.9 UnsatisfiableError: The following specifications were found
#15 2074.9 to be incompatible with the existing python installation in your environment:
#15 2074.9
#15 2074.9 Specifications:
#15 2074.9
#15 2074.9 - six -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.5,<3.6.0a0']
#15 2074.9 - wheel -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']
#15 2074.9 - xformers -> python[version='>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']
#15 2074.9
#15 2074.9 Your python: python=3.10

Answer 4 · 2022-12-09T13:43:03.000Z

Hey! Unfortunately xformers only has precompiled binaries for a very select list of package version combinations (I have some notes about this at the top of the Dockerfile). 11.7 won't work. You could try 11.3 though.

P.S. I don't know much about running on diffusers on an M1 beyond that it's possible. You may well need to search docker-diffusers-api codebase for anywhere I've written cuda and replace it with mps. I'll try fix this in a future release so this won't be necessary (you're the first person to try this 😅) Please do report on your findings, would love to get this working for all M1 users!

Answer 5 · 2022-12-09T22:11:23.000Z

Its works con M1 with 11.3 :D but exited after a few seconds with no visible logs, at last with my knowledge :S

Answer 6 · 2022-12-09T22:12:16.000Z

ormedo commented 3 years ago

Answer 7 · 2022-12-10T13:07:16.000Z

There are the logs inside container.

Traceback (most recent call last):
File "/api/server.py", line 12, in
user_src.init()
File "/api/app.py", line 53, in init
"device": torch.cuda.get_device_name(),
File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 329, in get_device_name
return get_device_properties(device).name
File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 359, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
ERROR conda.cli.main_run:execute(47): conda run /bin/bash -c python3 -u server.py failed. (See above for error)
libc10_cuda.so: cannot open shared object file: No such file or directory
WARNING: libc10_cuda.so: cannot open shared object file: No such file or directory
Need to compile C++ extensions to get sparse attention support. Please run python setup.py build develop

Answer 8 · 2022-12-10T13:15:41.000Z

Hey, thanks! Logs make it much easier to understand what's going on.

So yeah, as I suspected, unfortunately we're going to have to look for any code that references nvidia's cuda and remove it if it's not needed, or replace it with mps where possible, to work on Apple M1.

I would really love to make docker-diffusers-api work out the box with M1, but it's going to be quite a while until I'll have the time to be actively involved here :(

In the meantime, the line in question can be removed entirely (app.py line 53: device: torch.cuda...). And you'll need to search through all the files for any other mention of "cuda" and replace it with "mps" (especially anything like .to("cuda"), device="cuda", or anything like that).

Again, I wish I could help more, and look into automatically detecting the right GPU, but I just don't have time at the moment, and really am not sure when I will :( But please keep this issue open, please keep us updated with your progress, and I will take a more active role here when I can. I'll also be available to answer questions to the best of my ability (but I really have zero experience with Apple, unfortunately).

Answer 9 · 2022-12-10T13:18:01.000Z

And for future reference:

Answer 10 · 2022-12-10T16:18:41.000Z

I Understand.
I want to test before go on production in Banana's enviroment.
But 1 Click installation goes sweet!

Answer 11 · 2022-12-10T17:00:05.000Z

Oh, awesome! That's great. Thanks for reporting back about that.. at least you can still play in the meantime :)

I should have a chance to look at this next week... if we're lucky, it will all just work afterwards. Otherwise it will take a lot longer 😅 Do you know any good places to rent M1's online? I think one of the companies I've used before has them, I'll try to remember 😅

Answer 12 · 2022-12-10T22:11:10.000Z

AWS allow M1 Mac Mini instances if I remember well

Answer 13 · 2022-12-11T20:41:28.000Z

Oh great, thanks!

More future ref stuff for me...

https://pytorch.org/docs/stable/notes/mps.html

https://chrisdare.medium.com/running-pytorch-on-apple-silicon-m1-gpus-a8bb6f680b02