/PLSQR3.2013.09

Parallel LSQR version3

Primary LanguageC

===============
 * Parallel LSQR v3 (PLSQR3)
 *
 * (Hwang-Ho) He Huang
 * huanghe.us@gmail.com
 * Liqiang Wang
 * wang@cs.uwyo.edu; lwangcs@gmail.com
 * Department of Computer Science, University of Wyoming
 *
 * John M. Dennis
 * dennis@ucar.edu
 * National Center for Atmospheric Research. Boulder, CO
 *
 * En-Jui Lee (rickli92@gmail.com)
 * Po Chen  (pchen@uwyo.edu ; pochengeophysics@gmail.com)
 * Department of Geology and Geophysics
 * University of Wyoming
 *
 * Last update: 9/24/2013
 *
 * References:
 *
 * En-Jui Lee, He Huang, John M. Dennis, Po Chen, Liqiang Wang,
 * An optimized parallel LSQR algorithm for seismic tomography,
 * Computers & Geosciences, Volume 61, December 2013, Pages 184-197,
 * ISSN 0098-3004, http://dx.doi.org/10.1016/j.cageo.2013.08.013.
 * (http://www.sciencedirect.com/science/article/pii/S0098300413002409)
 *
 * Huang, H., Dennis, J.M., Wang, L., Chen, P., 2013.
 * A scalable parallel LSQR algorithm for solving large-scale linear system for tomographic problems: a case study in seismic tomography.
 * In: Proceedings of the 2013 International Conference on Computational Science (ICCS). Procedia Computer Science.
 *
 * He Huang, Liqiang Wang, En-Jui Lee, and Po Chen.
 * An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR).
 * In the 2012 International Conference on Computational Science (ICCS) (main track).
 * Procedia Computer Science, Elsevier, 2012.
 * 

=================
||PLSQR3 Manual||
=================

source codes of PLSQR3 : PLSQR3.2013.09/source
tools for running PLSQR3 : PLSQR3.2013.09/PLSQR3_tools
example dataset for PLSQR3 testing : PLSQR3.2013.09/data

A. Input files of kernel matrix 
  (1). Kernel matrix data (ONLY store non-zero elements in kernel matrix)
       1-based column and row indexing, sort by COLUMN
       Format: binary 
       (could use programs in PLSQR3.2013.09/PLSQR3_tools/kernel_format to generate inputs)
       Example:(in ASCII)
       rowIdx(int) colIdx(int) value(double)
       4           8           7.708820e-01
       5           8           9.082630e-01
       3           10          2.271540e-01
       7           25          6.604270e-01
       1           26          6.365470e-01
       9           26          4.711560e-01
       ..........

  (2). Information file of kernel matrix data
       1st column is column index;
       2nd column is the number of nonzero in this column;
       3rd column is the displacement (offset) index from the beginning of the data file. 
       (if the third column is zero, that means the number of nonzero in this column is zero)
       Format: ASCII
       Example:
       (int)   (int)   (long long)       
       ......
       21      0       0
       22      1       1
       23      1       2
       24      1       3
       ......
       684     6       655
       685     4       661
       686     5       665
       687     0       0
       ......


B. Input files of damping matrix
  (1). Row-sorted damping matrix data (ONLY store non-zero elements in damping matrix)
       1-based column and row indexing, sort by ROW
       Format: binary
       Example:(in ASCII)
       rowIdx(int) colIdx(int) val(double)
       1           1           1.0
       2           1           1.0
       2           2           -2.0
       2           3           1.0
       3           1           1.0
       3           10          -2.0
       3           19          1.0
       4           1           1.0
       .........

  (2). Column-sorted damping matrix data (ONLY store non-zero elements)
       1-based column and row indexing, sort by COLUMN
       Format: binary
       Example:(in ASCII)
       rowIdx(int) colIdx(int) val(double)
       1           1           1.0
       2           1           1.0
       3           1           1.0
       4           1           1.0
       5           1           0.5
       6           1           0.5
       7           1           0.5	 
       2           2           -2.0
       8           2           1.0
       .........

  (3). Number of non-zero in each row for row-based damping matrix data
       Format: binary (double)
       Example: (in ASCII)
       nnzPerRow(int)
       1
       3
       3
       3
       .....

  (4). Number of non-zero in each column for column-based damping matrix data
       Format: binary (double) 
       Example:(in ASCII)
       nnzPerColumn(int)
       7
       8
       11
       11
       .......


C. Input of the measurement vector
       Measurement values that correspond to kernel matrix (the values that correspond to damping matrix
       are zrros and will be generated by the progeam)
       Format: ASCII 
       Example:
       measurement(ASCII)
       -0.9897
       -1.8150
        0.0829
       -0.2884
       -0.6363
       ........


D. Execution command
   mpiexec -np 16 /EXE/PATH/PLSQR3 -dir /YOUR/DATA/PATH -ker_f matrix_bycol.mm2.bin -ker_i matrix_bycol.mm2.info -damp_f damp_row_data.bin -damp_f_bycol damp_col_data.bin -damp_i damp_row_info.bin -damp_i_bycol damp_col_info.bin -b_k measurement.list -row_k 100 -row_d 1910672 -col 302940 -itn 100 -row_ptn damp_row.index -col_ptn ker_col.index

   -dir: data directory, all the data files must be in this directory
   -ker_f: kernel binary file, sort by column (details in A(1))
   -ker_i: kernel information (details in A(2))
   -damp_f: damping binary file, sort by row (details in B(1))
   -damp_f_bycol: damping binary file, sort by column (details in B(2))
   -damp_i: damping information for row-sorted damping matrix (details in B(3))
   -damp_i_bycol: damping information for column-sorted damping matrix (details in B(4))
   -b_k: measurement vector (details in C)
   -row_k: kernel row number
   -row_d: damping row number 
   -col: colume number
   -itn: iteration number
   -row_ptn: optional, row partition file (details in E(3)), if ignored, use even partition 
   -col_ptn: optional, col partition file (details in E(3)), if ignored, use even partition

   Note: if -row_ptn and -col_ptn is not provided, then the program evenly partition row and column.


E. Other programs 
  (1). convert kernel matrix to PLSQR3 input format
       source codes: PLSQR3.2013.09/PLSQR3_tools/kernel_format
       1.1. convert ASCII kernel (row) files to binary (input for next setp)
            execution command: ker_ascii2bin kernel_list
                               mpiexec -np 16 ker_ascii2bin ker.list
            input "kernel_list": 1st row is number of kernel file and the rest of rows are name of kernel files.
            input example:
            100
            AZ.BZN_CI.PER_BB.APBPnz
            AZ.CPE_CI.BFS_BB.APBPnz
            AZ.CPE_CI.MUR_BB.APBPnz
            AZ.CRY_CI.BAR_BB.APBPnz
	    ....
            
            format of kernel files(ASCII):
            colIdx  ix  iy  iz  values
            4494    39  46   1  -8.181353e-07
            4495    40  46   1  -1.029945e-06
            4496    41  46   1  -1.101910e-06
            4497    42  46   1  -1.090375e-06
	    .......

       1.2. collect column information 
            execution commond: Ker2PLSQR3_preprocess kernel/path/ binary_kernel_list matrix_column_number
                               Ker2PLSQR3_preprocess PLSQR3.2013.09/data ker_bin.list 302940
            input "binary_kernel_list": 1st column is name of binary kernel files; 2nd row is it's number of non-zero elements
            input example:
            AZ.BZN_CI.PER_BB.APBPnz.bin 8437
            AZ.CPE_CI.BFS_BB.APBPnz.bin 13062
            AZ.CPE_CI.MUR_BB.APBPnz.bin 8957
            AZ.CRY_CI.BAR_BB.APBPnz.bin 12193
            AZ.CRY_CI.SDR_BB.APBPnz.bin 10485
            ......

       1.3. convert binary files to PLSQR3 input format
            execution commond: Ker2PLSQR3 kernel/path binary_kernel_list matrix_column_number output_of_1.2
                               Ker2PLSQR3 PLSQR3.2013.09/data ker_bin.list 302940 PLSQR3.2013.09/data/col_info.txt
            outputs are input files of PLSQR3

  (2). reordered damping matrix for PLSQR
       source code: PLSQR3.2013.09/PLSQR3_tools/damping_format
       execution commond: damping_binary.py 1 1 damp 99 153 10 2 1.0 1.0 1.0 1.0 1.0 1.0 1.0
       NOTE:this code only generates identity & Laplacian damping

  (3). load balancing
       source code: PLSQR3.2013.09/PLSQR3_tools/load_balancing
       execution commond: load_balance_col_nz kernel_info_file info_file_of_col-sorted_damping col_number damping_row_number kernel_non-zero_number damping_non-zero_number col_ratio elem_ratio processor_number
                          load_balance_col_nz matrix_bycol_v4.mm2.info damp_v7I_D001_S001_col_data.bin 38093195 261330576 24384107533 818542016 2.125 1.45 640

      there are two output files:
      
      ker_col.index: kernel column and vector x partition 
      damp_row.index : damping row and vector y partition 

      in "ker_col.index" file, the column range of the kernel matrix for each core are stored in each row. For example,
      1  12729592                 ==> column range of the kernel matrix (vector x) for the first core
      12729593  16414717   ==> column range of the kernel matrix (vector x) for the second core

      Note that the value starts from 1.

      in "damp_row.index" file, the row range of the damping for each core are stored in each row. For example,
      1  83530320                  ==> row range of the damping matrix (vector y) for the first core
      83530321  109251895  ==> row range of the damping matrix (vector y) for the second core