/collfs

Joint ANL/KSL collaboration on collective file system module (with glibc dynamic library interception)

Primary LanguageC

***********************************************************************************************
# What is collfs?

collfs is a user library and set of patches to glibc that enables file system calls to be handled collectively over an
MPI Communicator.  collfs was written with the specific goal of providing scalable dynamic loading on supercomputers
with faster interconnect than i/o performance, but is a general-purpose tool that can be used either directly as a
library or implicitly by wrapping file system calls with collective versions.

The collfs patches are currently quite minimal.  They insert an externally visible struct object into the run-time
dynamic linker containing void function hooks for collective MPI versions of the standard file system API.  These function pointers
are then used (when non-NULL) to replace file system function calls within the standard dynamic loading routines in glibc.  Additionally,
we provide a small static C library via LD_PRELOAD that initializes MPI (this is pre-main, so it passes NULL to the Init
routines) and activates the function hooks to point to our collective versions of the standard file system API,
shadowing the original file system functions (but utilizing them by default).  

***********************************************************************************************
# Installing collfs on a Fedora Core 5 system

Fedora Core 5 is at this point, quite old, but it features a 2.4 version of glibc, which is very close to the 2.4 glibc
installed on the IBM Blue Gene/P.

This assumes you have an FC5 installation with the standard development RPMs installed.  If you are unable to build
glibc-2.4 from the SRPM, you will probably need to install the missing RPMs from the DVD or a web repository.

Tips for setting up a non-root installation environment for SRPMs can be found here:
http://www.owlriver.com/tips/non-root//

mkdir -p ~/sandbox/glibc
cd ~/sandbox/glibc
wget http://dl.dropbox.com/u/65439/glibc-2.4-11-collfs.src.rpm
rpmbuild --nodeps --rebuild -bc glibc-2.4-11-collfs.src.rpm

This should leave you with a built glibc (good for testing) in the RPM build directory, which
was/var/tmp/glibc-2.4-root on my machine.

***********************************************************************************************
# Building glibc-2.4 for the compute nodes:

tmpdir=/tmp/aron-glibc-bgp-build
prefix=/home/aron/bgpsys
build=powerpc-linux-gnu
# download and patch glibc sources

cd $tmpdir
curl -O ftp://ftp.gnu.org/gnu/glibc/glibc-2.4.tar.gz
tar -zxvf glibc-2.4.tar.gz

# apply IBM patches to glibc

patch -p2 -E < /bgsys/drivers/ppcfloor/toolchain/glibc-2.4.diff

# temporary linux headers
mkdir -p $tmpdir/templinuxheaders-build/include && \
cp -f -r -L /usr/include/asm \
	 $tmpdir/templinuxheaders-build/include/asm && \
cp -f -r -L /usr/include/asm-ppc \
	 $tmpdir/templinuxheaders-build/include/asm-ppc && \
cp -f -r -L /usr/include/asm-generic \
	$tmpdir/templinuxheaders-build/include/asm-generic && \
cp -f -r -L /usr/include/linux \
	 $tmpdir/templinuxheaders-build/include/linux

# configure using default-shared gcc specs
mkdir -p $tmpdir/glibc-2.4-build
cd $tmpdir/glibc-2.4-build &&  \
    PATH=/usr/gnu/bin:/usr/bin:/bin:/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/gnu-linux/bin/ LD_LIBRARY_PATH=    \
    CC="powerpc-bgp-linux-gcc -specs=/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/gnu-linux/lib/gcc/powerpc-bgp-linux/4.1.2/specs.orig" \
    libc_cv_ppc_machine=yes  \
    dd1_workarounds=  \
    libc_cv_forced_unwind=yes  \
    libc_cv_c_cleanup=yes  \
    libc_cv_slibdir="/lib" \
    $tmpdir/glibc-2.4/configure   \
    --prefix=$prefix \
    --sysconfdir="/etc"   \
    --libdir="/lib" \
    --bindir="/bin"  \
    --datadir="/usr/share" \
    --libexecdir="/libexec"   \
    --build=$build  \
    --host=powerpc-bgp-linux   \
    --enable-shared  \
    --enable-multilib \
    --without-cvs \
    --without-gd \
    --with-elf \
    --with-fp=yes \
    --enable-add-ons=powerpc-cpu,nptl \
    --with-cpu=450fp2   \
    --with-tls \
    --with-__thread   \
    --enable-__cxa_atexit \
    --with-headers=$tmpdir/templinuxheaders-build/include

# patch makeconfig

# make
cd $tmpdir/glibc-2.4-build && PATH=/usr/gnu/bin:/usr/bin:/bin:/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/gnu-linux/bin/ LD_LIBRARY_PATH= \
    make 2>&1 | tee make.log

# make install
cd $tmpdir/glibc-2.4-build && PATH=/usr/gnu/bin:/usr/bin:/bin:/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/gnu-linux/bin/ LD_LIBRARY_PATH= \
    make install 2>&1 | tee install.log


***********************************************************************************************
See shaheen/build_glibc_2.4-sles-10.sh for the basic idea of how to patch the SLES 10
glibc with collfs.  I am working from the patched source that you can obtain from the SLES 
10 src.rpm by unpacking the distribution then applying all the patches (just before the 
configure stage).

If you want to run an executable with the updated dynamic loader interpreter, you need to
launch your executable using the installed ld64.so.1 like this:

/home/aron/sys/lib/ld64.so.1 ./main

***********************************************************************************************
some random notes for building glibc Shaheen:

../configure --prefix=/home/aron/sys --libexecdir=/home/aron/sys/lib64 --infodir=/home/aron/sys/share/info --enable-add-ons=nptl,libidn,dceext --srcdir=.. --without-cvs --with-headers=/home/aron/src/rpm/BUILD/kernel-headers --build ppc64-suse-linux --host ppc64-suse-linux --with-tls --with-__thread --enable-kernel=2.6.4  2>&1 CC="gcc -m64" | tee configure.log

make 2>&1 | tee make.log

# make install wants a local ld.so.conf, so copy the one over from /etc or build it by hand
cp ~/sandbox/collfs.git/shaheen/ld.so.conf ~/sys/etc/ld.so.conf
make install

***********************************************************************************************
anticipated changes to:
dl-load.c

793: __close
837: __fxstat64
858: __close
887: __close
912: __close
975: __seek
976: __libc_read
1185: __mmap
1236: __mmap
1350: __munmap
1418: __close
1617: __open
1629: __libc_read
1727: __lseek
1728: __libc_read
1745: __lseek
1746: __libc_read
1761: __close
1872: __fxstat64
1878: __close
1899: __close
2118: __close

dl-close.c

498: DL_UNMAP
***********************************************************************************************