/NPB3.0-omp-C

NAS Parallel Benchmarks 3.0 OpenMP C version

Primary LanguageC

/***********************************************************************
*
*                    NAS Parallel Benchmarks 3.0
*
*                    Unofficial OpenMP C Version
*
*  Copyright 2014 University of Versailles Saint Quentin en Yvlines  
*  Copyright 2000 Omni OpenMP Compiler Project 
*  Copyright 1991-2014 NASA Advanced Supercomputing Division
*
*                         November 08, 2014
*
***********************************************************************/

1. Introduction

This package contains an unofficial C version of the NAS Parallel Benchmarks
OpenMP 3.0.  The benchmarks are derived from the Omni OpenMP Compiler Project
2.3 unofficial C version of NPB 2.3.

The benchmarks were modified to match the new official 3.0 Fortran NPB.
benchmarks. In particular, benchmarks in this release follow the same parallel
region structure than the official 3.0 Fortran NPB.

2. Change Log

This section tracks all the modifications that were made to update the
Omni OpenMP 2.3 unofficial C version to the 3.0 version.

Each modification includes an annotation of the form XXX:YYY,
where XXX is the line number in the 2.3 version and YYY is the 
line number in the 3.0 version.

BT
  * Transforms the initialization process into distinct parallel regions:
    129:127 - initialize  
    131:128 - lhsinit
    133:129 - exact_rhs
  * Splits adi function into separated parallel regions:
    211:206 - compute_rhs
    213:209 - x_solve
    215:212 - y_solve
    217:215 - z_solve
    219:218 - add
  * TODO - Initialization part different 

CG
  * Updates sparse in makea
    761:756 - add preload data pages parallel region loop 
  * Splits the first main parallel region 
    187:195 - initialization parallel region in main are reduced and single regions are replaced by serial parts 
    356:347 - conj grad (see below)
    209:219 - three loops parallelization        
  * Decompose Conj grad function into parallel regions 
    402:395 - initialize conj grad algorithm
    413:405 - conj grad iteration loop
    551:551 - compute residual norm explicitly             
  * Split the second main parallel region
    261:260 - conj grad
    275:271 - post conj grad two loops parallelization
  * 78:78 - Remove temporarry array w 
  * 375:367 - Remove static variables in conj grad since they are initialized at each iteration
  * 500:494 - Add barrier after reduction because LLVM OpenMP does not support implicit synchronization after a reduction
  * 504:498 - Remove single because all the variables are private due to parallel regions updates

EP
  * Parallelizes x-array initialization
    109:110 - main

FT
  * Splits the first main parallel region   
    123:123 - compute_indexmap
    131:130 - fft (see below)
  * Splits the second main parallel region   
    176:168 - evolve
    164:158 - fft
    187:177 - fft
    845:849 - checksum 
  * Decompose fft into parallel regions
    514:501 - cffts1
    562:556 - cffts2
    607:606 - cffts3    
  * define y0 and y1 in cffts[123] on the stack because there is no pointer in fortran
  * TODO - need to insert init_ui to touch the initial data 

IS
  * Already in C

LU
  * Splits into parallel regions boundaries and initialization of dependent variables and also forcing term computing
    125:123 - setbv
    130:128 - setiv
    135:133 - erhs
  * Produces setparte parallel regions for ssor
    3073:3092 - l2norm
    3068:3087 - rhs
    3064:3083 - SSOR initialization
    3094:3112 - SSOR iteration 
  * TODO - Parallelize post computation part
    - error 
    - pintgr

MG
  * Turns omp parallel region into omp parallel loop for 
    1239:1211 - zero3
  * Split first big parallel region  
    238:233 - norm2u3
    257:249 - mg3P 
    258:243 - resid  
  * Splits the main iteration big parallel region
    273:262- resid
    274:263 - norm2u3
    277:266 - mg3P   
  * Transforms the omp parallel region into omp parallel loops from main and mg3P
    846:826 - norm2u3  
    527:516 - resid
    608:595 - rprj3
    1245:1217 - zero3    
    463:454 - psinv
    684:669 - interep (produces two parallel regions)    
  * 820:835 - Remove static because of algorithmic changes 
  * 836:862 - Algorithmic changes 
  
SP
  * Splits adi function into separated parallel regions:
    205:204 - compute_rhs
    208:207 - txinvr
    211:210 - x_solve and ninvr
    214:213 - y_solve and tzetar
    216:216 - z_solve and pinvr
    220:219 - add
  * TODO - Initialize has to be updated 
    - 654:659 parallelize the function
  * TODO - Post computation verify has to be parallelized   
    221:226 - error_norm 
    
3. Installation

THe package should contain the following files/directories:

  README - this file
  README.omc - Readme of the Omni OpenMP Compiler release
  LOG.omc - Change log of the Omni OpenMP Compiler release
  Makefile - makefile for the suite (not modified from NPB2.3-omp)
  Doc - documentations (not modified from NPB2.3-serial)
  BT, CG, EP, FT, IS, LU, MG, SP - directory for each program
  bin - directory to put executable files
  common - common routines (only change version display from NPB2.3-omp)
  config - configuration files used by 'make' (not modified from NPB2.3-omp)
  sys - utilities (only change version display from NPB2.3-omp)
  
To use the suite, edit file 'make.def' in directory 'config'.
You must specify the name of compiler and linker, and compiler options.  
For more details, refer to file "README.install" in subdirectory "Doc" and to "README.omc".

4. Information

- Original benchmark suite

  Information on the NAS Parallel Benchmarks is available at
  http://www.nas.nasa.gov/NAS/NPB/.

- C translation

  Information on the OpenMP C versions and the Omni OpenMP compiler is 
  available at http://pdplab.trc.rwcp.or.jp/pdperf/Omni/.

- 3.0 NAS update 
  
  Information on the NAS Parallel Benchmarks translation from 2.3 to 3.0 is available at http://benchmark-subsetting.github.io/cNPB/
  Note that NAS does not support the OpenMP C versions nor the Omni OpenMP compiler supports the 3.0 structured NAS.
  License informations about the NAS NPB benchmarks can be found at http://opensource.org/licenses/nasa1.3.php