Project-MONAI/monai-deploy-app-sdk

[BUG] MAP build failing during GitHub Action

tomaroberts opened this issue ยท 5 comments

Describe the bug
I am trying to build a MAP via a GitHub Action, however the build is failing during/after STEP 19 in the monai-deploy package build process. See screenshot:

image

Steps/Code to reproduce bug
Output of GitHub Action can be found here: Click Package Map & test MAP end-to-end.

GHA .yml file can be found here: https://github.com/GSTT-CSC/TotalSegmentator-AIDE/blob/24_publish_map/.github/workflows/publish_map.yml

Effectively, running monai-deploy package app --tag map/init:temp -l DEBUG fails. The GitHub Action completes, but unsuccessfully because the MAP build only reaches STEP 19/29.

Expected behavior
I would expect the MAP to build and then be able to see it via the docker images command, however it is only partially built and has the tag <none> in the screenshot above.

Environment details (please complete the following information)
GitHub Actions environment

@tomaroberts sorry for the late reply. I initially misunderstood the issue, thinking related to App SDK actions.

I've inspected the log from the Action, and in the step running packager, there was an docker build error, due to dependency incompatibility, at this line

I've checked my venv, the scipy version is 1.10.1 and the numpy is at 1.24.3. In your build log, the scipy is at 1.6.3. We may need to see why the scipy version is at such.

Step Display installed pip package shows scipy at 1.9.0, but the Packager build will pip install the dependency over the base image, and if scipy is already installed satisfying the requirement, no upgrade will be done.

This led me to the base image, which in the SDK v0.5 Packager, the default is nvcr.io/nvidia/pytorch:21.11-py3, but I had since changed it to nvcr.io/nvidia/pytorch:22.08-py3 in the main branch, to be released in a patched version.

Please try to build the MAP outside of the Actions and confirm the new base image works, with the --base or -b command line option, similar to the example below,
monai-deploy package -b nvcr.io/nvidia/pytorch:22.08-py3 my_app --tag my_app:latest -m model.ts

@MMelQin โ€“ I did monai-deploy package app -b nvcr.io/nvidia/pytorch:22.08-py3 --tag my_app:latest on a local machine and the MAP builds.

The problem, as you suggested, is the version of scipy within the MAP. This originates from the nvcr.io/nvidia/pytorch:22.08-py3 container. Both versions 21.11 and 22.08 use scipy==1.6.3.

I just tested this with the following:

docker pull nvcr.io/nvidia/pytorch:22.08-py3
docker run -it nvcr.io/nvidia/pytorch:22.08-py3
pip list | grep scipy

which gives:

image

I've also just tried building the MAP again with GHA and am seeing similar problem. It boils down to three package conflicts in lines 2219-2224:

[2023-07-05 13:26:48,368] [DEBUG] (app_packager) - tensorboard 2.9.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.1 which is incompatible.

[2023-07-05 13:26:48,368] [DEBUG] (app_packager) - scipy 1.6.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.24.4 which is incompatible.

[2023-07-05 13:26:48,368] [DEBUG] (app_packager) - numba 0.56.0 requires numpy<1.23,>=1.18, but you have numpy 1.24.4 which is incompatible.

I notice that right at the beginning of the MAP package process, it forces pip to install numpy>=1.21.0 (line 44), which means it will always use the latest version of numpy โ€“ currently 1.24.4. This is the source of the scipy compatibility conflict.

The protobuf compatibility conflict listed above may need further investigating, however, this conflict didn't arise when usingnvcr.io/nvidia/pytorch:21.07-py3 as the MAP base image, so perhaps I can roll that back and it will be fine.

So... how can I edit the MAP Dockerfile to adjust the numpy>=1.21.0 line?

@MMelQin

One idea I just had was to use the --requirements option within the monai-deploy package command to specify numpy<1.21.0.

I've just tried this both on a local machine and through GHA.

On my local machine, it worked and if I enter the container via docker run -it I can see that the MAP Python environment has changed to numpy==1.20.4.

However, via GHA, the packager still fails as you can see here.

I notice that in ALL GHA processes so far, they hard fail after Step 19/29. Step 20/29 is a RUN curl ... command... could the failure on GHA be related to that?

I've investigated a bit more and run some more Actions, but no luck so far.

I'm convinced it's something to do with Step 19 failing which relates to the COPYing requirements.txt file into the MAP:

[2023-07-05 15:16:47,469] [DEBUG] (app_packager) - Step 19/29 : COPY ./pip/requirements.txt /tmp/requirements.txt

failed to export image: failed to create image: failed to get layer sha256:9c0e0507ab0926c8f004b0d4fdc1a2abe04a762e81f733b3ff45743f01365247: layer does not exist

Things I've tried:

  • adding @md.env(pip_packages=...) within app.py to point to the requirements.txt โ€“ GHA here
  • using the -r option within monai-deploy package ... โ€“ GHA here

Any other suggestions @MMelQin ?

I've investigated a bit more and run some more Actions, but no luck so far.

I'm convinced it's something to do with Step 19 failing which relates to the COPYing requirements.txt file into the MAP:

[2023-07-05 15:16:47,469] [DEBUG] (app_packager) - Step 19/29 : COPY ./pip/requirements.txt /tmp/requirements.txt
failed to export image: failed to create image: failed to get layer sha256:9c0e0507ab0926c8f004b0d4fdc1a2abe04a762e81f733b3ff45743f01365247: layer does not exist

Things I've tried:

* adding `@md.env(pip_packages=...)` within `app.py` to point to the requirements.txt โ€“ [GHA here](https://github.com/GSTT-CSC/TotalSegmentator-AIDE/actions/runs/5486135214/jobs/9995832736)

* using the `-r` option within `monai-deploy package ...` โ€“ [GHA here](https://github.com/GSTT-CSC/TotalSegmentator-AIDE/actions/runs/5465913806/jobs/9950069495)

Any other suggestions @MMelQin ?

Hi!

I just came across this issue by coincidence. This error could mean that the /tmp/requirements.txt already exists and it is already the same file you are trying to copy.

According to https://stackoverflow.com/questions/51115856/docker-failed-to-export-image-failed-to-create-image-failed-to-get-layer , this error usually shows up when the COPY command makes no difference on the resulting image.