SSAGESLabs/PySAGES

jaxlib issue on midway3

ndtrung81 opened this issue · 5 comments

Hi all,

I have been trying to keep the pysages module up to date on midway2 and midway3; following the steps described in https://hackmd.io/Jbpc1E2kRbKPLUnmLKwINA

While the abf example script with openmm runs normally, I got the following errors with the example scripts under examples/hoomd-blue:

jaxlib.xla_extension.XlaRuntimeError: No matching device found for local hardware

To reproduce this error, on a GPU compute node on midway3 (midway3-0279 in this case) I did

module load python/anaconda-2021.05 cuda/11.2 openmpi/4.1.2+gcc-7.4.0 source activate pysages3
and then goes to examples/hoomd-blue/unbiased, run

python3 gen_gsd.py

to get start.gsd. Then run

python3 unbiased.py

(screenshot attached)

Screenshot from 2022-08-30 16-17-46

Maybe I missed something important here. Any suggestion will be appreciated.

(hoomd v2.9.7 installed in this pysages3 environment runs normally.)

Thanks,
-Trung

Hey Trung,

so I had to reinstall and test with Hoomd-blue on midway 3 during the last couple of days.
I load the following modules on midway3.

module purge

module load rcc
module load slurm

module load cuda
module load cmake
module load openmpi
conda activate ls-hoomd

where the ls-hoomd conda environment looks like this:

#
elwood                   /home/ludwigschneider/.conda/envs/elwood
ls-hoomd              *  /home/ludwigschneider/.conda/envs/ls-hoomd
mypysages                /home/ludwigschneider/.conda/envs/mypysages
base                     /software/python-anaconda-2020.11-el8-x86_64
anvio-7.1                /software/python-anaconda-2020.11-el8-x86_64/envs/anvio-7.1
arcgis                   /software/python-anaconda-2020.11-el8-x86_64/envs/arcgis
automl_prediction        /software/python-anaconda-2020.11-el8-x86_64/envs/automl_prediction
dask                     /software/python-anaconda-2020.11-el8-x86_64/envs/dask
env_deeplabcut           /software/python-anaconda-2020.11-el8-x86_64/envs/env_deeplabcut
fflip                    /software/python-anaconda-2020.11-el8-x86_64/envs/fflip
geo_jpg                  /software/python-anaconda-2020.11-el8-x86_64/envs/geo_jpg
geospatial               /software/python-anaconda-2020.11-el8-x86_64/envs/geospatial
hoomd                    /software/python-anaconda-2020.11-el8-x86_64/envs/hoomd
img_conversion           /software/python-anaconda-2020.11-el8-x86_64/envs/img_conversion
meep                     /software/python-anaconda-2020.11-el8-x86_64/envs/meep
mkdocs                   /software/python-anaconda-2020.11-el8-x86_64/envs/mkdocs
mpi4py                   /software/python-anaconda-2020.11-el8-x86_64/envs/mpi4py
mrsid                    /software/python-anaconda-2020.11-el8-x86_64/envs/mrsid
openmm                   /software/python-anaconda-2020.11-el8-x86_64/envs/openmm
pmeep                    /software/python-anaconda-2020.11-el8-x86_64/envs/pmeep
pysages                  /software/python-anaconda-2020.11-el8-x86_64/envs/pysages
pytorch-gpu-1.2-cuda-10.0     /software/python-anaconda-2020.11-el8-x86_64/envs/pytorch-gpu-1.2-cuda-10.0
qgis_stable              /software/python-anaconda-2020.11-el8-x86_64/envs/qgis_stable
rstudio                  /software/python-anaconda-2020.11-el8-x86_64/envs/rstudio
test_python_env          /software/python-anaconda-2020.11-el8-x86_64/envs/test_python_env
tf_keras                 /software/python-anaconda-2020.11-el8-x86_64/envs/tf_keras
vertexai                 /software/python-anaconda-2020.11-el8-x86_64/envs/vertexai

And manually installed jax for cuda
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html`

And installation of hoomd-blue and hoomd-dlext from source.

hope this helps

@InnocentBug thanks for sharing. It seems that enforcing python=3.8 when installing packages makes hoomd-blue examples work again. Will give more updates soon.

Is this still an issue?

Not an issue at this point. Let's mark this issue as resolved. The current env pysages3 under python/anaconda-2021.05 on midway3 works fine with the examples as far as my tests go. Needs module load python/anacoda-2021.05 openmpi/4.1.2+gcc-7.4.0 cuda/11.2