22.08 nightly container does not launch Dask scheduler properly
Closed this issue · 5 comments
The EC2 MNMG notebook currently uses the stable 21.06 container (rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04-py3.8
) and is able to launch the EC2 cluster successfully.
However, when I replace it with the latest nightly container (rapidsai/rapidsai-nightly:22.08-cuda11.5-runtime-ubuntu20.04-py3.9
), the EC2 cluster fails to launch. For some reason, the container fails to initialize the Dask container at port 8786. (I waited more than 3 hours and the scheduler still didn't come up at at 8786.)
TODO. Investigate why python -m distributed.cli.dask_scheduler
fails on the latest nightly container.
I wonder if the environment variable DISABLE_JUPYTER
needs to be set to true
, the RAPIDS docker image might not be starting Dask at all if it is just blocking on Jupyter as the foreground process.
cluster = EC2Cluster(env_vars={"DISABLE_JUPYTER": "true", **get_aws_credentials()},
...
xref rapidsai/docker#425 but that change was done in January so I'm surprised we aren't seeing these issues in 22.06
too.
The current notebook uses 21.06. When I switched to 22.06, I got the same issue.
Indeed, after setting DISABLE_JUPYTER=true
, I observe the Dask scheduler launching successfully. I will incorporate this in my pull request. Thanks!
Ah yup, I misread your initial comment as 22.06
, but if we are upgrading from 21.06
that makes a lot of sense.
@hcho3 just going through old issues, can this be closed out now?