jaheyns/CfdOF

[Feature request] Support mpi hostfile for clustered setup

Closed this issue ยท 11 comments

I successfully managed to run clustered setup with CfdOf as a gui. I added --hostfile my_hosts to mpirun routine, and setup shared directory for case folder.
I wish there be a little gui to add --hostfile argument to mpirun.

I can take a look at this, potentially over the weeked. @einhander @oliveroxtoby can you give a bit more info as to what is wanted, maybe a couple screenshot or illustration if you have time? It should be too much effort if i understand the basic requirement correctly?

@icojb25 here the photo-montage of that I mean:
1

The corresponding key for mpiexec:

mpiexec --hostfile mpi_hostfile -np $nproc "$exe" -parallel "$@" 1> >(tee -a log."$sol") 2> >(tee -a log."$sol" >&2)

The --hostfile key and mpi_hostfile file name should be used togather in case of clustered setup. And mpirun should be used without them in case of a local parallel run.

hi @einhander ... how is mpi_hostfile getting populated? Are there any other changes that are required apart from changing the execute line above?

@oliveroxtoby did you have any thoughts about how you might want this implemented ... in a GUI panel, or something above? I'll have to see if i can remember how the GUI templating works :)

@oliveroxtoby did you have any thoughts about how you might want this implemented ... in a GUI panel, or something above? I'll have to see if i can remember how the GUI templating works :)

@icojb25 thanks for looking at this. I'd just add a property to the analysis object for the hostfile to be specified. If not blank, it should add this option (in both parallel meshing and solving). Wouldn't want to clutter the GUI task panel pages with it as it will be a seldom used power user option.

Hi @oliveroxtoby @einhander take a look at the above. I've set it up for Linux only at this stage, if its fine, i will update Allrun.ps1 and Allrun.bat as well ...

@icojb25

how is mpi_hostfile getting populated?

It's a plane text file with hostnames or ip's of cluster node, optionally with number of cpu

take a look at the above

Thanks I'll try it ASAP.

@icojb25 It works fine on my Linux box with both Use Hostfile=true and false.
On second thought, I think the Use Hostfile and Hostfile Name settings should be moved to CFdSolver's Solver section.
The default value for Hostfile Name should be ../mpi_hostfile, in which case the file won't be overwritten, then the case will be recreated.

@icojb25

how is mpi_hostfile getting populated?

It's a plane text file with hostnames or ip's of cluster node, optionally with number of cpu

Yeah, I'm aware of what it is, my question was how was / is it being populated / generated ... since this normally comes from the job scheduler or perhaps you are writing this manually for your own cluster. I guess the question was whether we assume this would be existing - which I assume we will.

@icojb25 It works fine on my Linux box with both Use Hostfile=true and false. On second thought, I think the Use Hostfile and Hostfile Name settings should be moved to CFdSolver's Solver section. The default value for Hostfile Name should be ../mpi_hostfile, in which case the file won't be overwritten, then the case will be recreated.

Great, thanks for the confirmation. @oliveroxtoby Lmk what you think of changing the location ... I followed the original suggestion. :)

@icojb25

you are writing this manually for your own cluster.

You are right, I'm writing it manually.

Great, thanks for the confirmation. @oliveroxtoby Lmk what you think of changing the location ... I followed the original suggestion. :)

I'd prefer it to remain under tha anlysis object as it should apply to both the solver and the mesher, when running snappyHexMesh or cfMesh in MPI parallel mode.

I'd prefer it to remain under tha anlysis object as it should apply to both the solver and the mesher, when running snappyHexMesh or cfMesh in MPI parallel mode.

Got it, thanks @oliveroxtoby and for the confirmation @einhander . I will push an update to the filename ../mpi_hostfile and then i guess we can merge it, since it seems to work. cheers ๐Ÿ‘