What is MP-LAMP?
MP-LAMP stands for Massive Parallel LAMP, which is a parallel version of LAMP. LAMP stands for Limitless-Arity Multiple-testing Procedure.
Installation
Installation to Amazon Web Service (AWS)
MP-LAMP will be ready by following the steps.
- Create an Amazon EC2 using the Amazon Linux Image.
- Download mp-lamp
- Uncompress it.
- Move to the top of the uncompressed directory.
- Run the following command.
$ bash aws/aws_installer_single.sh
Installation to a local environment
Currently, Intel CPU and linux is assumed. If you encounter troubles during the installation process, please send us the error message and the environment.
Prerequisite
tools | recommended version |
---|---|
Compiler | g++ (4.3 or later) |
MPI Library | OpenMPI, MPICH, MVAPICH or Intel MPI |
build tool | SCons, 2.0.0 or higher (python is needed for SCons) |
boost library | boost library 1.55.0 or later |
gflags | gflags 2.0 or later |
Notes:
-
For gcc, 4.9.3 or later is preferable. Older gcc will produce slightly slower binary.
-
The latest version of gflags requires CMAKE for the build tool. For users not familiar with CMAKE, we advise to use gflags v2.0 which could be installed by configure, make. gflags v2.0
Compilation
-
Please satisfy the prerequisite.
-
Copy local.sample.cfg to local.cfg and edit appropriately.
- [compilers]
- single: compiler for non-parallel code (g++ or icpc)
- parallel: compiler for MPI (typically mpicxx)
- options: additional options for compier (added to CXXFLAGS)
- libs: additional options for library (added to LDFLAGS)
- [paths]
- include and library path
- Not needed if there is not library in non-default location.
- [compilers]
-
Sample local.cfg
[compilers]
single=g++
parallel=mpicxx
# an example for linux.
option=-DGTEST_USE_OWN_TR1_TUPLE=1 -DHAVE_CLOCK_GETTIME
libs=-lrt
# an example for Mac
# option=-msse4.2 -mpopcnt -march=corei7
[paths]
# include=/path/to/your/include_directory
# library=/path/to/your/include_library
- If local.cfg is ready, go to top directory of lamp_search and type
$ scons
or for parallel build (for 4 threads)
$ scons -j 4
Parallel binary cont_lamp will be ready.
If you want to build LAMP for binary features not continuous features, rewrite the sconscript at ./mp-main to build bin_lamp. (TO FIX)
Note: to run the parallel version, please use mpiexec
as shown in the following example.
Usage
- cont-lamp can be used from command line.
For 32 processes, $ mpiexec -hostfile ${machinefile} -np 32 ./cont-lamp --item item_file.csv --pos positive_file.csv --a 0.05 --show_progress --log * --item: item data file * -pos: positive data file * --a: significance level (default 0.05) * --show_progress: It is adivsed to turn on show_progress for long jobs. * --log: Shows the breakdown of execution time. It is not needed for most users. It might be useful to find out problems when mp-lamp is unexpectedly slow.
Sample Toy Data
- Item data file format. By default, mp-lamp reads the following csv format item data. It assumes that the first line includes the name of the items and the rest of the lines have the name of the transactions at the beginning.
#gene,TF1,TF2,TF3,TF4
A,1,1,1,0
B,1,1,1,0
C,1,0,0,1
D,0,0,0,0
E,1,1,1,0
F,1,0,0,0
G,1,1,1,1
H,0,0,0,0
I,0,1,0,1
J,0,0,1,0
K,0,0,0,1
L,0,0,0,1
M,0,0,0,1
N,1,1,1,0
O,0,0,0,0
- Positive data file format. An example of the positive data format corresponding to the item data file is shown below. The first line is required to start with a "#". Current version crashes if the number of lines does not match with the item file.
#gene,expression
A,1
B,1
C,0
D,0
E,1
F,0
G,1
H,1
I,0
J,0
K,0
L,0
M,1
N,1
O,0
Sample usage and output
- Sample command and output of the 2-process parallel version solving the toy data. Do not forget to invoke the command using "mpiexec" or "mpirun".
$ mpiexec -np 2 ./mp-lamp --item ./samples/sample_data/sample_item.csv --pos ./samples/sample_data/sample_expression_over1.csv --a 0.05 --show_progress
# item file : ./samples/sample_data/sample_item.csv
# positive file: ./samples/sample_data/sample_expression_over1.csv
# # of transactions= 15 # of items= 4 # of total positives= 7 max freq= 7 max positive= 5 max items in trans.= 4
# preprocess end
# lambda=6 cs_thr[lambda]= 7 pmin_thr[lambda-1]= 0.00699301 num_expand= 1 elapsed_time=0.000616
# 1st phase start
# lambda=6 closed_set_num[n>=lambda]= 4 cs_thr[lambda]= 7 pmin_thr[lambda-1]= 0.00699301 num_expand= 1 elapsed_time=0.000661
# 1st phase end
# lambda=6 num_expand= 2 elapsed_time=0.001023
# 2nd phase start
# lambda=5 int_sig_lev=0.0125 elapsed_time=0.001052
# 2nd phase end
# closed_set_num= 5 sig_lev=0.01 num_expand= 3 elapsed_time=0.001165
# 3rd phase end
# sig_lev=0.01 elapsed_time=0.001564
# time all= 0.006031 time search= 0.001858
# min. sup=5 correction factor=5
# number of significant patterns=1
# pval (raw) pval (corr) freq pos # items items
0.00699301 0.034965 5 5 3 TF1 TF2 TF3
Notes
-
Current version does not work with "mpiexec -np 1". Please use at least two processes for the parallel version.
-
Current version is only targeted for data with small number of transactions. For data with more than 100,000 transactions, please wait for the future updates.
Optional
- mpiP (http://mpip.sourceforge.net/) is recommended for profiling MPI program.
Contact
Please contact the following for bug reports, comments, or requests.
- yoshizoe(AT)acm.org
- ddyuudd(AT)gmail.com
License
MP-LAMP is an open source code project licensed under the Revised BSD license.
Author
Kazuki Yoshizoe implemented MP-LAMP for binary features. Yuu Jinnai modified the code to support dataset with continuous features.