Importing mpi4py Unnecessarily when Importing pygeo.geo_utils

Question

Importing mpi4py Unnecessarily when Importing pygeo.geo_utils

akleb opened this issue 4 years ago · 11 comments

Description

When scripting together a bunch of runs its is useful to be able to call os.system or subprocess.run on mpirun commands. During the setup up for various runs when grids are created pygeo cannot currently be used to correct or modify geometries because it will break any calls to command line mpirun. I think this is due to pygeo importing all of its modules whenever it is called, which in some cases also imports mpi4py. System calls cannot be made that use mpirun if mpi4py was previously imported. I am not too familiar with the pygeo structure, but there may be other submodules that this sort of thing also applies to.

Steps to reproduce issue

The steps are pretty much as follows, an example script is seen directly below the steps

import something from pygeo
try to run either os.system("mpirun hostname") or subprocess.run("mpirun hostame", shell=True)

>>> from pygeo.geo_utils import *
>>> import subprocess
>>> print(subprocess.run("mpirun hostname", shell=True))

Current behavior

The current behavior returns a non-zero error code on any system calls with mpirun calls. For the attached script this looks like:

$ python test.py 
CompletedProcess(args='mpirun hostname', returncode=1)

Expected behavior

For the sample script the expected behavior is a list of all of the processors on your machine. For my 24 core workstation it looks like this:

$ python test.py 
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
CompletedProcess(args='mpirun hostname', returncode=0)

Answer 1 · 2021-06-10T19:56:33.000Z

This is probably due to my poor insight, but it looks like mpi4py is not explicitly imported, e.g:

>>> from pygeo.geo_utils import *
>>> mpi4py.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'mpi4py' is not defined

so I am a bit confused about how to address this.
There are workarounds to this I believe (e.g. move all your pyGeo-related stuff to another separate script) but this is something we should figure out.

Answer 2 · 2021-06-10T20:35:40.000Z

My current solution was to move the import pygeo in pyFoil to be inside the function that uses it, which I don't ever call. I read on the mpi4py documentation site that the mpi environment is initialized as soon as import mpi4py is called, which I believe in turn messes up the mpirun calls later. I am guessing these implicit imports still initialize the mpi environment

Answer 3 · 2021-06-22T22:23:50.000Z

As Marco mentioned, geo_utils does not import or use mpi4py, but other codes like DVConstraints do.
@akleb is geo_utils the only pygeo subpackage you use in your cases? If so, would you try changing import pygeo at the top of pyFoil to import pygeo.geo_utils and see if it works?

Answer 4 · 2021-06-23T13:40:41.000Z

In the example I provided I am importing only geo_utils functions and the issue still persists. Perhaps my computer has something else weird going on. Could you try that MWE I provided in the steps to reproduce?

Answer 5 · 2021-06-23T16:01:21.000Z

I can replicate the issue on my machine using your MWE

Answer 6 · 2021-06-23T16:39:35.000Z

You're right, I was misunderstanding how __init__ works.
One thing we could do is to add geo_utils as a subpackage in setup.py (which needs some other changes in repo structure accordingly), then I believe you can import pygeo.geo_utils without importing mpi4py.
(EDIT: I was not able to do this)
If this works, would it be enough for your cases, or do you need any other fixes?

Answer 7 · 2021-06-23T18:24:47.000Z

Actually, it seems not possible to do the above in a backward compatible manner.

Answer 8 · 2021-06-23T23:31:34.000Z

One thing we could do is to add geo_utils as a subpackage in setup.py (which needs some other changes in repo structure accordingly), then I believe you can import pygeo.geo_utils without importing mpi4py.
If this works, would it be enough for your cases, or do you need any other fixes?

If this was the case, then I believe it would work. I just needed to be able to import pygeo.geo_utils without implicitly importing mpi4py. It doesn't seem good practice for pyfoil to import mpi4py implicitly when it really isn't needed at all.

Answer 9 · 2021-06-24T00:37:15.000Z

I believe the way to fix this is just not to include anything that requires optional dependencies (MPI, OpenVSP etc.) in the init. From what I can see this should not impact anyone who is using the "standard" pyGeo functionalities, but those using DVGeometryVSP etc. will have to change their imports. I think this should be okay.

In general, unless you are actually using MPI functionality, it's almost always a bad idea to import mpi4py.

Answer 10 · 2021-06-24T00:52:12.000Z

But DVGeometry.py and DVConstraints.py also use MPI (though just a little), do you intend to refactor these codes and remove their dependencies on MPI?
I think all we need is to import mpi4py in the methods that use MPI, and remove the import at the beginning of the scripts.

Answer 11 · 2021-06-24T01:14:54.000Z

I propose the following:

DVGeometry: There is a single allreduce, so we can bring the MPI import inside the if comm block
DVConstraints: The print statements can be probably removed IMO. As for the f2py wrapping, we can import MPI just in evalTriangulatedSurfConstraint since only geograd requires MPI.