=============== * Parallel LSQR v3 (PLSQR3) * * (Hwang-Ho) He Huang * huanghe.us@gmail.com * Liqiang Wang * wang@cs.uwyo.edu; lwangcs@gmail.com * Department of Computer Science, University of Wyoming * * John M. Dennis * dennis@ucar.edu * National Center for Atmospheric Research. Boulder, CO * * En-Jui Lee (rickli92@gmail.com) * Po Chen (pchen@uwyo.edu ; pochengeophysics@gmail.com) * Department of Geology and Geophysics * University of Wyoming * * Last update: 9/24/2013 * * References: * * En-Jui Lee, He Huang, John M. Dennis, Po Chen, Liqiang Wang, * An optimized parallel LSQR algorithm for seismic tomography, * Computers & Geosciences, Volume 61, December 2013, Pages 184-197, * ISSN 0098-3004, http://dx.doi.org/10.1016/j.cageo.2013.08.013. * (http://www.sciencedirect.com/science/article/pii/S0098300413002409) * * Huang, H., Dennis, J.M., Wang, L., Chen, P., 2013. * A scalable parallel LSQR algorithm for solving large-scale linear system for tomographic problems: a case study in seismic tomography. * In: Proceedings of the 2013 International Conference on Computational Science (ICCS). Procedia Computer Science. * * He Huang, Liqiang Wang, En-Jui Lee, and Po Chen. * An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR). * In the 2012 International Conference on Computational Science (ICCS) (main track). * Procedia Computer Science, Elsevier, 2012. * ================= ||PLSQR3 Manual|| ================= source codes of PLSQR3 : PLSQR3.2013.09/source tools for running PLSQR3 : PLSQR3.2013.09/PLSQR3_tools example dataset for PLSQR3 testing : PLSQR3.2013.09/data A. Input files of kernel matrix (1). Kernel matrix data (ONLY store non-zero elements in kernel matrix) 1-based column and row indexing, sort by COLUMN Format: binary (could use programs in PLSQR3.2013.09/PLSQR3_tools/kernel_format to generate inputs) Example:(in ASCII) rowIdx(int) colIdx(int) value(double) 4 8 7.708820e-01 5 8 9.082630e-01 3 10 2.271540e-01 7 25 6.604270e-01 1 26 6.365470e-01 9 26 4.711560e-01 .......... (2). Information file of kernel matrix data 1st column is column index; 2nd column is the number of nonzero in this column; 3rd column is the displacement (offset) index from the beginning of the data file. (if the third column is zero, that means the number of nonzero in this column is zero) Format: ASCII Example: (int) (int) (long long) ...... 21 0 0 22 1 1 23 1 2 24 1 3 ...... 684 6 655 685 4 661 686 5 665 687 0 0 ...... B. Input files of damping matrix (1). Row-sorted damping matrix data (ONLY store non-zero elements in damping matrix) 1-based column and row indexing, sort by ROW Format: binary Example:(in ASCII) rowIdx(int) colIdx(int) val(double) 1 1 1.0 2 1 1.0 2 2 -2.0 2 3 1.0 3 1 1.0 3 10 -2.0 3 19 1.0 4 1 1.0 ......... (2). Column-sorted damping matrix data (ONLY store non-zero elements) 1-based column and row indexing, sort by COLUMN Format: binary Example:(in ASCII) rowIdx(int) colIdx(int) val(double) 1 1 1.0 2 1 1.0 3 1 1.0 4 1 1.0 5 1 0.5 6 1 0.5 7 1 0.5 2 2 -2.0 8 2 1.0 ......... (3). Number of non-zero in each row for row-based damping matrix data Format: binary (double) Example: (in ASCII) nnzPerRow(int) 1 3 3 3 ..... (4). Number of non-zero in each column for column-based damping matrix data Format: binary (double) Example:(in ASCII) nnzPerColumn(int) 7 8 11 11 ....... C. Input of the measurement vector Measurement values that correspond to kernel matrix (the values that correspond to damping matrix are zrros and will be generated by the progeam) Format: ASCII Example: measurement(ASCII) -0.9897 -1.8150 0.0829 -0.2884 -0.6363 ........ D. Execution command mpiexec -np 16 /EXE/PATH/PLSQR3 -dir /YOUR/DATA/PATH -ker_f matrix_bycol.mm2.bin -ker_i matrix_bycol.mm2.info -damp_f damp_row_data.bin -damp_f_bycol damp_col_data.bin -damp_i damp_row_info.bin -damp_i_bycol damp_col_info.bin -b_k measurement.list -row_k 100 -row_d 1910672 -col 302940 -itn 100 -row_ptn damp_row.index -col_ptn ker_col.index -dir: data directory, all the data files must be in this directory -ker_f: kernel binary file, sort by column (details in A(1)) -ker_i: kernel information (details in A(2)) -damp_f: damping binary file, sort by row (details in B(1)) -damp_f_bycol: damping binary file, sort by column (details in B(2)) -damp_i: damping information for row-sorted damping matrix (details in B(3)) -damp_i_bycol: damping information for column-sorted damping matrix (details in B(4)) -b_k: measurement vector (details in C) -row_k: kernel row number -row_d: damping row number -col: colume number -itn: iteration number -row_ptn: optional, row partition file (details in E(3)), if ignored, use even partition -col_ptn: optional, col partition file (details in E(3)), if ignored, use even partition Note: if -row_ptn and -col_ptn is not provided, then the program evenly partition row and column. E. Other programs (1). convert kernel matrix to PLSQR3 input format source codes: PLSQR3.2013.09/PLSQR3_tools/kernel_format 1.1. convert ASCII kernel (row) files to binary (input for next setp) execution command: ker_ascii2bin kernel_list mpiexec -np 16 ker_ascii2bin ker.list input "kernel_list": 1st row is number of kernel file and the rest of rows are name of kernel files. input example: 100 AZ.BZN_CI.PER_BB.APBPnz AZ.CPE_CI.BFS_BB.APBPnz AZ.CPE_CI.MUR_BB.APBPnz AZ.CRY_CI.BAR_BB.APBPnz .... format of kernel files(ASCII): colIdx ix iy iz values 4494 39 46 1 -8.181353e-07 4495 40 46 1 -1.029945e-06 4496 41 46 1 -1.101910e-06 4497 42 46 1 -1.090375e-06 ....... 1.2. collect column information execution commond: Ker2PLSQR3_preprocess kernel/path/ binary_kernel_list matrix_column_number Ker2PLSQR3_preprocess PLSQR3.2013.09/data ker_bin.list 302940 input "binary_kernel_list": 1st column is name of binary kernel files; 2nd row is it's number of non-zero elements input example: AZ.BZN_CI.PER_BB.APBPnz.bin 8437 AZ.CPE_CI.BFS_BB.APBPnz.bin 13062 AZ.CPE_CI.MUR_BB.APBPnz.bin 8957 AZ.CRY_CI.BAR_BB.APBPnz.bin 12193 AZ.CRY_CI.SDR_BB.APBPnz.bin 10485 ...... 1.3. convert binary files to PLSQR3 input format execution commond: Ker2PLSQR3 kernel/path binary_kernel_list matrix_column_number output_of_1.2 Ker2PLSQR3 PLSQR3.2013.09/data ker_bin.list 302940 PLSQR3.2013.09/data/col_info.txt outputs are input files of PLSQR3 (2). reordered damping matrix for PLSQR source code: PLSQR3.2013.09/PLSQR3_tools/damping_format execution commond: damping_binary.py 1 1 damp 99 153 10 2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NOTE:this code only generates identity & Laplacian damping (3). load balancing source code: PLSQR3.2013.09/PLSQR3_tools/load_balancing execution commond: load_balance_col_nz kernel_info_file info_file_of_col-sorted_damping col_number damping_row_number kernel_non-zero_number damping_non-zero_number col_ratio elem_ratio processor_number load_balance_col_nz matrix_bycol_v4.mm2.info damp_v7I_D001_S001_col_data.bin 38093195 261330576 24384107533 818542016 2.125 1.45 640 there are two output files: ker_col.index: kernel column and vector x partition damp_row.index : damping row and vector y partition in "ker_col.index" file, the column range of the kernel matrix for each core are stored in each row. For example, 1 12729592 ==> column range of the kernel matrix (vector x) for the first core 12729593 16414717 ==> column range of the kernel matrix (vector x) for the second core Note that the value starts from 1. in "damp_row.index" file, the row range of the damping for each core are stored in each row. For example, 1 83530320 ==> row range of the damping matrix (vector y) for the first core 83530321 109251895 ==> row range of the damping matrix (vector y) for the second core