Importing mpi4py Unnecessarily when Importing pygeo.geo_utils
akleb opened this issue · 11 comments
Description
When scripting together a bunch of runs its is useful to be able to call os.system
or subprocess.run
on mpirun
commands. During the setup up for various runs when grids are created pygeo
cannot currently be used to correct or modify geometries because it will break any calls to command line mpirun
. I think this is due to pygeo importing all of its modules whenever it is called, which in some cases also imports mpi4py
. System calls cannot be made that use mpirun
if mpi4py
was previously imported. I am not too familiar with the pygeo
structure, but there may be other submodules that this sort of thing also applies to.
Steps to reproduce issue
The steps are pretty much as follows, an example script is seen directly below the steps
import
something from pygeo- try to run either
os.system("mpirun hostname")
orsubprocess.run("mpirun hostame", shell=True)
>>> from pygeo.geo_utils import *
>>> import subprocess
>>> print(subprocess.run("mpirun hostname", shell=True))
Current behavior
The current behavior returns a non-zero error code on any system calls with mpirun
calls. For the attached script this looks like:
$ python test.py
CompletedProcess(args='mpirun hostname', returncode=1)
Expected behavior
For the sample script the expected behavior is a list of all of the processors on your machine. For my 24 core workstation it looks like this:
$ python test.py
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
akleb-workstation
CompletedProcess(args='mpirun hostname', returncode=0)
This is probably due to my poor insight, but it looks like mpi4py
is not explicitly imported, e.g:
>>> from pygeo.geo_utils import *
>>> mpi4py.__version__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'mpi4py' is not defined
so I am a bit confused about how to address this.
There are workarounds to this I believe (e.g. move all your pyGeo-related stuff to another separate script) but this is something we should figure out.
My current solution was to move the import pygeo
in pyFoil
to be inside the function that uses it, which I don't ever call. I read on the mpi4py
documentation site that the mpi environment is initialized as soon as import mpi4py
is called, which I believe in turn messes up the mpirun
calls later. I am guessing these implicit imports still initialize the mpi environment
As Marco mentioned, geo_utils
does not import or use mpi4py
, but other codes like DVConstraints
do.
@akleb is geo_utils
the only pygeo subpackage you use in your cases? If so, would you try changing import pygeo
at the top of pyFoil
to import pygeo.geo_utils
and see if it works?
In the example I provided I am importing only geo_utils
functions and the issue still persists. Perhaps my computer has something else weird going on. Could you try that MWE I provided in the steps to reproduce?
I can replicate the issue on my machine using your MWE
You're right, I was misunderstanding how __init__
works.
One thing we could do is to add geo_utils
as a subpackage in setup.py
(which needs some other changes in repo structure accordingly), then I believe you can import pygeo.geo_utils
without importing mpi4py
.
(EDIT: I was not able to do this)
If this works, would it be enough for your cases, or do you need any other fixes?
Actually, it seems not possible to do the above in a backward compatible manner.
One thing we could do is to add
geo_utils
as a subpackage insetup.py
(which needs some other changes in repo structure accordingly), then I believe you canimport pygeo.geo_utils
without importingmpi4py
.
If this works, would it be enough for your cases, or do you need any other fixes?
If this was the case, then I believe it would work. I just needed to be able to import pygeo.geo_utils
without implicitly importing mpi4py
. It doesn't seem good practice for pyfoil
to import mpi4py
implicitly when it really isn't needed at all.
I believe the way to fix this is just not to include anything that requires optional dependencies (MPI, OpenVSP etc.) in the init. From what I can see this should not impact anyone who is using the "standard" pyGeo functionalities, but those using DVGeometryVSP
etc. will have to change their imports. I think this should be okay.
In general, unless you are actually using MPI functionality, it's almost always a bad idea to import mpi4py
.
But DVGeometry.py
and DVConstraints.py
also use MPI (though just a little), do you intend to refactor these codes and remove their dependencies on MPI?
I think all we need is to import mpi4py in the methods that use MPI, and remove the import at the beginning of the scripts.
I propose the following:
DVGeometry
: There is a singleallreduce
, so we can bring theMPI
import inside theif comm
blockDVConstraints
: The print statements can be probably removed IMO. As for the f2py wrapping, we can import MPI just inevalTriangulatedSurfConstraint
since onlygeograd
requires MPI.