/MPI

Message Passing Interface

Primary LanguagePython

Basic Setup on Master cluster:

Install mpich

sudo apt-get update && sudo apt-get install mpich

Configure /etc/hosts file. (Master node must have all the list of ip addresses and identity of slave nodes and master node itself) // mpirun or mpiexec can load this file to execute a program in parallel.

Add a user “mpiuser” (All nodes having mpiuser is preferred to reduce possible confusion) // sudo adduser mpiuser

Make mpiuser a sudoer

    sudo usermod -a -G sudo mpiuser

Setup an ssh server so that the master and slave clusters can interchange messages.

    sudo apt-get install openssh-server”

In order for mpirun / mpiexec to execute a program in parallel smoothly, a passwordless ssh environment must be set up. Setup steps are as follows:

    su - mpiuser
    ssh-keygen -t rsa		*  or -t dsa
    ssh-copy-id client 		** ip-address may also be used.
    eval `ssh-agent`
    ssh-add ~/.ssh/id_rsa

*  you might have to use -t dsa if your system is outdated. Like linux before 2000.
** client is defined in machine file or /etc/hosts
	i.e. cat /etc/hosts
	      192.168.x.x	client
	      127.0.0.1		localhost

If done correctly with all the steps above, the master should be able to ssh to the slave nodes without a password. If it is not working, rm ~/.ssh/id* and redo the steps above.

Once a passwordless ssh channel is created, a shared directory (nfs in this case), where a message can be interchanged, needs to be created. Passwordless ssh must be available to proceed to this step. Steps are as follows:

    sudo apt-get install nfs-kernel-server

create a folder name “cloud” under mpiuser’s home directory

    mkdir ~/cloud

add an entry to /etc/exports with “home/mpiuser/cloud *(rw,sync,no_root_squash,no_subtree_check)”

    echo “home/mpiuser/cloud *(rw,sync,no_root_squash,no_subtree_check)” >> /etc/exports

Reapply /etc/exports by running

    exportfs -a
    sudo service nfs-kernel-server restart

Basic Setup on Slave cluster

A shared directory must be configured on each slave cluster as well. Steps are as follows:

    sudo apt-get update && sudo apt-get install mpich
    sudo apt-get install nfs-common

Create a folder name “cloud” under mpiuser’s home directory (slave node) and mount to the shared directory (cloud) of the master cluster.

    mkdir ~/cloud
    sudo mount -t nfs master:/home/mpiuser/cloud ~/cloud

The step above requires that slave cluster configure /etc/hosts with ip address and identity of the master cluster and itself.

Add the following entry to /etc/fstab so that the mounted shared folder does not get unmounted on reboot. “master:/home/mpiuser/cloud /home/mpiuser/cloud nfs”

    echo “master:/home/mpiuser/cloud /home/mpiuser/cloud nfs” >> /etc/fstab

Once the basic environments are setup, the mpi program can be executed (From master cluster). Methods are as follows:

    mpirun -np 1 (no. of cores) -hosts master,client (identities in /etc/hosts) ./mpi_executable
    mpirun -np 1 (no. of cores) ./mpi_executable # Running on the master cluster only.
    mpirun -np 1 (no. of cores) --hostfile /etc/hosts ./mpi_executable

Following is current /etc/hosts file on the master cluster:

    cat /etc/hosts
    127.0.0.1    localhost
    #127.0.1.1  master_identity
    192.168.0.31    master
    192.168.0.32    client
    192.168.0.33    client2