Azrael3000/tmpi

`tmpi` just opens one shell/pane with no errors

ali-ramadhan opened this issue · 5 comments

Thank you @Azrael3000 for maintaining this awesome script! It's been a game-changer for debugging MPI.

I'm having issues setting it up on a new server though.

Running e.g. tmpi 2 gdb (or any other command with any number of ranks) just open one pane with a bash shell and nothing else. Having a hard time figuring out the problem without any errors.

I'm using OpenMPI

[wdmc@tartarus ~]$ mpiexec --version
mpiexec (OpenRTE) 4.0.3

Report bugs to http://www.open-mpi.org/community/help/

image

Hi Ali,
glad to hear that this tool is useful for you. It seems as if process 0 starts up properly as there is no output from tmpi.
What happens is that tmpi first starts a new windows and then calls itself n times. Proc 0 is a bit special as it replaces the original window created by tmpi.
To check one thing, what is the content of /tmp/tmpi.lock?
Also in your newly created window, what are the values of the variables dummy, window and session?

@Azrael3000 Thanks for the great tool!

@ali-ramadhan
I experienced a similar issue that might be related:

I was using tmpi in a docker, hence I was root user.
When using OpenMPI, this requires to add the option--allow-run-as-root
to mpirun. Otherwise, it will terminate with an error.

Running tmpi out of the box produced your issue on this docker.

Adding --allow-run-as-root to the mpirun command in the tmpi script fixed it for me.
(Alternatively, you could probably also try to fix this issue by setting the environment variables

OMPI_ALLOW_RUN_AS_ROOT 
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM

to 1 in your shell before calling tmpi. Haven't tried it myself yet.)

In any case, always make sure that mpirun runs as expected.

Good to know @domcharrier
I suspect that we should get some proper error handling into TMPI to make it easier to figure out why mpirun fails. I'll see what I can come up with.

I added a new feature that closes the dummy window in case mpirun fails to launch. That should amke it easier to see the error right away.

It's currently on the reptyr dev branch and should be getting into master "soon"