/STREAmS

Supersonic TuRbulEnt Accelerated navier stokes Solver

Primary LanguageFortranGNU General Public License v3.0GPL-3.0

A new version of the solver STREAmS-2.0 is now available https://github.com/STREAmS-CFD/STREAmS-2. We strongly suggest to use the new release.

!=============================================================
!
! ███████╗████████╗██████╗ ███████╗ █████╗ ███╗   ███╗███████╗
! ██╔════╝╚══██╔══╝██╔══██╗██╔════╝██╔══██╗████╗ ████║██╔════╝
! ███████╗   ██║   ██████╔╝█████╗  ███████║██╔████╔██║███████╗
! ╚════██║   ██║   ██╔══██╗██╔══╝  ██╔══██║██║╚██╔╝██║╚════██║
! ███████║   ██║   ██║  ██║███████╗██║  ██║██║ ╚═╝ ██║███████║
! ╚══════╝   ╚═╝   ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚══════╝
!
! Supersonic TuRbulEnt Accelerated navier stokes Solver
!
!=============================================================

STREAmS performs Direct Numerical Simulations of compressible turbulent flows in Cartesian geometry solving the unsteady, fully compressible Navier-Stokes equations for a perfect gas. Currently, three canonical wall-bounded flows can be simulated:

  • compressible turbulent channel flow
  • compressible zero-pressure-gradient turbulent boundary layer
  • supersonic oblique shock-wave/turbulent boundary-layer interaction

STREAmS can be used on both local clusters and massively parallel HPC architectures, including those based on Graphical Processing Units (GPUs).

References

Bernardini, M., Modesti, D., Salvadore, F., & Pirozzoli, S. (2021). STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows. Computer Physics Communications, 263, 107906. https://doi.org/10.1016/j.cpc.2021.107906

Compiling

STREAmS requires (1) a Fortran compiler and (2) an MPI library. For the GPU CUDA version, the NVIDIA compiler is required (tested using PGI 19.4 or more recent compilers). A Makefile with predefined settings is available to facilitate compiling. Different compiling modes can be selected by changing the three variables:

COMPILE 
MODE    
PREC    

COMPILE currently supports these choices:

  • pgi-cuda: PGI compiler Cuda Fortran asynchronous version with MPI library (tested with OpenMPI provided by PGI)
  • pgi-cuda-sync: PGI compiler Cuda Fortran synchronous version with MPI library (tested with OpenMPI provided by PGI)
  • pgi: PGI compiler CPU version with OpenMPI library (tested with OpenMPI provided by PGI)
  • intel: Intel compiler with MPI library (tested with IntelMPI library)
  • gnu: gnu compiler with MPI library (tested with OpenMPI and MPICH)
  • ibmxl: XL IBM compiler with MPI library (tested with IBM MPI library)
  • cray-cuda: PGI compiler with Cray mpich library without support of CUDA-Aware MPI (currently oriented to Pitz-Daint cluster)

MODE can be one of:

  • opt: optimized compilation, for standard production runs
  • debug: debugging compilation, with run-time checks enabled and backtracing when available

PREC defines the precision of the float numbers:

  • double: double precision floating point numbers
  • single: single precision floating point numbers

Running

STREAmS can be usually executed using a standard MPI launcher, e.g. mpirun. In addition to the executable, in the running folder you need:

  • input.dat: file defining the physical and numerical setup of the simulation, to be customized according to the desired needs. A detailed description of the input.dat file is given below. Some examples of input.dat files are available in the examples folder.
  • database_bl.dat: only required for initialization of boundary layer and shock-boundary layer interaction flows (this file does not have to be modified by the user). The file is available in the examples folder.

To run a simulation, type, e.g.:

mpirun -np 8 ./streams

or (for SLURM jobs)

srun ./streams

For CUDA versions in cluster environments, you must distribute MPI processes according to the number of GPUs available for each node. For CINECA Marconi-100 cluster -- 4 GPUS per node -- a possible submission script using 8 GPUs is:

#!/bin/bash
#SBATCH -N 2 
#SBATCH --tasks-per-node 4
#SBATCH --mem=64G 
#SBATCH --partition=debug 
#SBATCH --time=00:30:00 
#SBATCH --gres=gpu:4
#SBATCH --partition=debug 
module load profile/global pgi
srun ./streams

For CSCS Pitz-Daint cluster -- 1 GPU per node -- a possible submission script using 8 GPUs is:

#!/bin/bash
#SBATCH -N 8
#SBATCH --tasks-per-node 1 
#SBATCH --mem=15G 
#SBATCH --partition=debug 
#SBATCH --time=00:30:00 
#SBATCH --gres=gpu:1
#SBATCH --constraint=gpu 
module swap PrgEnv-cray PrgEnv-pgi
srun ./streams

Preparing input.dat file

flow_type defines the type of flow. 0 = channel flow, 1 = boundary layer, 2 = shock/boundary-layer interaction

Lx Ly Lz real numbers defining the size of the Cartesian domain along the three coordinate directions (x=streamwise, y=wall-normal, z=spanwise)

Nx Ny Nz integer numbers defining the number of grid nodes along each direction

Ny_wr Ly_wr dy+_w jbgrid specify the wall-normal mesh features. In all casesdy+_w is the desired spacing at the wall in inner units, . When flow_type >0, Ny_wr denotes the number of grid points in the wall-resolved region, ranging from y=0 (wall) up to y=Ly_wr, where a sinh mapping is applied. Both Ny_wr and Ly_wr can be specified when iflow = 1 (turbulent boundary layer), then a geometric progression is applied from y=Ly_wr up to y = Ly. When iflow = 2 (shock/boundary-layer interaction), Ny_wr must be specified but Ly_wr is automatically computed.

ng visc_ord ep_ord weno_par. Parameters to control the order of accuracy of the discretization. ng specifies the number of ghost nodes. visc_ord represents the order of accuracy for the computation of the viscous terms (must be even <=6). ep_ord represents the order of accuracy for the computation of the convective terms in the smooth flow regions (central scheme, must be even <=6). weno_par selects the order of accuracy (order = 2*weno_par-1) for the computation of the convective terms in the shocked flow regions (WENO recontruction, must be <=3).

MPI_x_split MPI_z_split define the MPI decomposition along x (streamwise) and z (spanwise). These numbers must be consistent to Nx, Ny, and Nz. In particular the following divisions must have zero remainder: Nx/MPI_x_split, Nz/MPI_z_split. Moreover, for flow_type>0 cases also Nx/MPI_z_split and Ny/MPI_x_split must have zero remainder. The product MPI_x_split * MPI_z_split must equal the total number of launched MPI processes.

sensor_threshold xshock_imp deflec_shock. sensor_threshold is the WENO threshold (WENO is active if the shock sensor value exceeds sensor_threshold), xshock_imp is the shock abscissa and deflec_shock is the shock angle (used only when flow_type > 0).

restart num_iter cfl dt_control print_control io_type. restart selects the start type (0=init, 1=restart run and statistics, 2=restart and continue statistics), num_iter the total number of iterations, cfl is the CFL number. The time step is re-evaluated every dt_control iterations. The residual file is printed every print_control iterations and io_type selects the type of I/O for restart (io_type=0 no I/O, io_type=1 serial I/O, io_type=2 MPI I/O).

Mach Reynolds (friction) temp_ratio visc_type Tref (dimensional) turb_inflow. Mach is the bulk Mach number for channel flow (defined with Twall) and the freestream Mach number for boundary layer and SBLI cases. Reynolds is the target friction Reynolds number. temp_ratio is the ratio between the wall temperature and the adiabatic wall temperature for boundary layer and SBLI, whereas for channel flow represents the ratio between bulk and wall temperature. visc_type selects the viscosity law (visc_type=1 power law, visc_type=2 Sutherland law, with reference dimensional temperature specified by Tref). For boundary layers and SBLI, turb_inflow selects the turbulent inflow type (if > 0. implies digital filtering and its value is used to control/reduce temperature fluctuations, if < 0. implies recycling-rescaling method with constant spanwise shifting and its value denotes the location of the recycling station).

stat_control xstat_num. Cumulative flow statistics are evaluate every stat_control iterations. xstat_num is the number of streamwise stations at which flow statistics are extracted and is only meaningful for boundary layer and SBLI cases (flow_type > 0).

xstat_list streamwise locations of boundary layer flow statistics (flow_type > 0)

dtsave dtsave_restart enable_plot3d enable_vtk. dt_save is the time interval between field output, dtsave_restart is the time interval between output of restart files, enable_plot3d>0 activates the plot3d format output, enable_vtk>0 activates vtk format output.

rand_type if < 0 produces not reproducible random sequences based on current time, if >=0 produces random sequences which are reproducible across different runs with the same configuration.

Typical input files for canonical flow cases are available.

Understanding the output files

Plot3d binary output files -- e.g. plot3dgrid.xyz and field_0001.q -- can be read using Tecplot or Paraview

VTK Rectilinear grid files -- e.g. field_0001.vtr -- can be read using Paraview.

Other statistics files are automatically produced according to the input configuration.

Channel flow output files

Residual file output_streams.dat

When running channel flow cases (iflow=0) the residuals file (output_streams.dat) contains seven columns:

  1. number of cycles,
  2. elapsed time,
  3. streamwise momentum residual,
  4. pressure gradient,
  5. bulk density (conserved to machine accuracy),
  6. bulk flow velocity (conserved to machine accuracy),
  7. bulk temperature. For instance, plotting the fourth column vs. the second allows the user to check the time history of the pressure gradient.

Mean flow statistics channstat.prof

The file `channstat.prof` contains the mean channel flow statistics.
This file is printed at the end of each run and it contains the mean flow statistics averaged
in the homogeneous spatial directions and in time (statistics are progressively updated in time at each restart if idiski=2, or collected from scratch if idisk=1).
The file `channstat.prof` contains 15 columns:
  1. is the wall-normal coordinate, normalized with the channel half width
  2. the wall-normal coordinate in viscous units
  3. the wall-normal coordinate transformed according to Trettel & Larsson in viscous units
  4. the mean streamwise velocity averaged according to Favre, normalized with the bulk flow velocity
  5. the mean streamwise velocity averaged according to Favre, normalized with the friction velocity
  6. the mean streamwise velocity transformed according to van Driest, normalized with the friction velocity
  7. the mean streamwise velocity transformed according to Trettel & Larsson in viscous units
  8. the mean density profile, normalized with the mean wall density
  9. the Favre streamwise Reynolds stress, normalized with the wall-shear stress
  10. the Favre wall-normal Reynolds stress, normalized with the wall-shear stress
  11. the Favre spanwise Reynolds stress, normalized with the wall-shear stress
  12. the Favre shear Reynolds stress, normalized with the wall-shear stress
  13. the mean temperature profile, normalized with the wall temperature
  14. the density fluctuations normalized with the mean wall density
  15. The temperature fluctuations, normalized with the wall temperature

Boundary layer output files

Residual file output_streams.dat

When running boundary layer cases (iflow=1 or 2) the residuals file (output_streams.dat) contains three columns:

  1. number of cycles,
  2. elapsed time,
  3. streamwise momentum residual,

cf files

The files cf_xxx.dat are ASCII files and the number xxx refers to the Cartesian streamwise MPI block to which the file belong. These files is printed at the end of each run. Statistics are progressively updated in time at each restart if idiski=2, or collected from scratch if idisk=1. The file contains 13 columns:

  1. streamwise coordinate normalized with the inlet boundary layer thickness
  2. skin-friction coefficient
  3. friction Reynolds number
  4. Compressible shape factor
  5. Incompressible shape factor
  6. boundary layer thickness, normalized with the inlet boundary layer thickness
  7. Compressible displacement thickness
  8. Compressible momentum thickness
  9. friction velocity
  10. Reynolds number based on the incompressible momentum thickness
  11. Incompressible friction coefficient, according to van Driest II transofrmation
  12. Reynolds number based on the compressible momentum thickness
  13. Mean pressure rms at the wall normalized with the wall-shear stress

stat files

The files stat_nnn.dat are ASCII containing the mean boundary layer statistics. The files are printed at the end of each run. Statistics are progressively updated in time at each restart if idiski=2, or collected from scratch if idisk=1. The number nnn indicates the global mesh index in the streamwise direction at which statistics are printed. The files contains 10 columns:

  1. , wall-normal coordinate, normalized with the local boundary layer thickness
  2. wall-normal coordinate, normalized with the viscous lenght scale
  3. the Favre averaged streamwise velocity, normalized with the friction velocity
  4. the streamwise velocity transformed according to van Driest
  5. the density scaled streamwise velocity rms
  6. the density scaled wall-normal velocity rms, in viscous units
  7. the density scaled spanwise velocity rms, in viscous units
  8. the density scaled Reynolds shear stress, in viscous units
  9. the square root of the mean density, normalized with the wall density
  10. The pressure rms, normalized with the wall-shear stress.