###PTMPI

Zoran Dimitrijevic zoran@cs.ucsb.edu

PTMPI

How to make:

Go to src and type makefile

this will make ptmpi executable, with support for block based matrix multiplication program.

What to change:

Right now startup procedure is not compleated, so you need first to change the directory from where you will start the ptmpi on the cluster in runptmpi.c in src dir.

How to run:

Manualy:

login to each node you want to start ptmpi and type: ./ptmpi i conffile args

start the processes from i= 0 to p-1 please. It should work if started in any order, but I did not have time to debug the code, and sometimes one process dies durring the init and than the rest are just blocked on select. I will change startup ASAP.

Automated startup:

e.g. start with:

./runptmpi conf_ptmpi.rc4x4 2048 32

Note that right now you need to change path for the local_script and configuration file: this can be changed to support unix ENVs...

PTMPI configuration file:

conf_ptmpi.rcxx should contain configuration for ptmpi system layout: structure is following:

...

This configuration must exist to be able to run ptmpi. However you do not need to use runptmpi, you can run the processes manualy on each node you are using.

runptmpi will execute local_start shell with appropriate arguments on each node. You can also change local_start script in order to change log files for each node. Initial value is /home/local_home/zoran/nodeXX.log where XX is ptmpi process number (supplied to ptmpi as first argument).

Output:

If you run the ptmpi manualy output will go to your tty. There are some problems right now with the NFS so I am log to local disk right now. You will need to change the dir in local_start script. Now it points to zoran dir. Configuration file is read from the nfs, and it should work also with output, but, I did not have time to test that... You can specify the same dir from which the ptmpi is started in local_start to use NFS.

Known BUGS:

I am using REUSE bit for listenning sockets. This works fine most of the time.

The startup procedure is not fail safe, so sometimes there will be some blocked processes hanging around. To kill all processes on cluster use: killnodes ptmpi

killnodes is the script included.

If the ptmpi seems not to run, you can try to change the port numbers in conf file or to (works fine most of the time) just wait a little. Before starting kill all stalled processes, if any. That is not problem with MPI implementation, but with startup (establishing connections) part of code, which is not properly debugged.

Results of the multiplication are checked, and are correct. Program is still in realy beta stage, and is not properly debugged.

If something seem to be wrong, please start processes manualy, and with debug turned on (not all... see code for details). Startup will work fine always if the processes are started from 0 to p-1...