leobago/fti

Bugs in running tutorial examples

kakulo opened this issue · 1 comments

Hello Leo,

I am trying to use FTI for MPI fault tolerance. I am trying to run examples following the README in /examples. But it is not working, which says "Number of ranks is not a multiple of the node size". Could you please check? Thank you! The execution log is attached below.

[xxx@quartz:examples]$ make init
Built target init
[xxx@quartz:examples]$ make hd
^C[ FTI Information ] : Reading FTI configuration file (config.fti)...
[ FTI Information ] : The execution ID is: 2020-04-08_17-30-43
[ FTI Warning 000000 ] : Number of ranks is not a multiple of the node size.
[ FTI Warning 000000 ] : FTI failed to pass the configuration test.
[ FTI Warning 000000 ] : Error => No such file or directory
[ FTI Warning 000000 ] : FTI failed to load configuration.
[ FTI Warning 000000 ] : Error => No such file or directory
Local data size is 512 x 515 = 4.000000 MB (4).
Target precision : 0.005000
Maximum number of iterations : 5000
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
Step : 0, error = 1.000000
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.
[ FTI Warning 000000 ] : FTI is not initialized.

Lenny

Hi Lenny,

Thank you for your interest in FTI. The message "Number of ranks is not a multiple of the node size" means that you have to provide the number of MPI ranks per node in the "Node Size" parameter of the configuration file of FTI. Then the total number of MPI ranks is the number of nodes X number of ranks per node (node size).

Also, I recommend taking a look at our new tutorial online: https://fault-tolerance-interface.readthedocs.io/en/latest/tutorial.html

Hopefully, this one is a little bit more clear. You can also find the details of the configuration file here:
https://fault-tolerance-interface.readthedocs.io/en/latest/configuration.html

Thank you and regards!
Leo