This is a package containing the functions to implement the swept space time decomposition rule in 2 dimensions on heterogeneous computing architecture.
The plan is to make this pip and conda installable.
PySweep depends heavily on HDF5, h5py, MPI, and mpi4py. HDF5 has to be built in parallel to function properly. First install MPI/mpi4py through whatever means you prefer, e.g., conda. I personally built mpich (guide) and mpi4py from source.
I installed mpich from the source directory as
./configure -prefix=/opt/mpich --with-device=ch4:ofi --disable-fortran |& tee c.txt
make 2>&1 | tee m.txt
make install |& tee mi.txt
I installed mpi4py from the source directory as
python setup.py build --mpicc=/opt/mpich/bin/mpicc
python setup.py install
Then get the source distribution of HDF5 and h5py. Install HDF5 first via
export CC=/opt/mpich/bin/mpicc #export CC path - change as is needed
./configure --enable-parallel --enable-shared --prefix=/opt/hdf5-1.12.0/ #configure from hdf5 source dir
make
make check
make install #You may or may not need sudo for this depending on your prefix
Next, install h5py from the source directory (git) via
export CC=/opt/mpich/bin/mpicc #doesn't need repeated if done in previous step and the same shell
export HDF5_MPI="ON"
python setup.py configure --hdf5=/opt/hdf5-1.12.0
python setup.py build
python setup.py install
- The grid used is uniform and rectangular.
- block_size should be 2^n and constrained by your GPU. It is also constrained by the relation mod(b,2*k)==0 where k is the number of points on each side of the operating point, e.g., 2 for a 5 point stencil.
- A total of three functions must be named accordingly and take specific arguments.
- This code is currently limited to periodic boundary conditions
The first two functions are python functions, a step
function and a set_globals
function. These functions will be used as they are named to take a step and to set global variables for the simulation; simple examples are provided below.
#Step function in example.py
def set_globals(*args,source_mod=None):
"""Use this function to set cpu global variables"""
global dt,dx,dy,scheme #true for one step
t0,tf,dt,dx,dy,scheme = args
if source_mod is not None:
keys = "DT","DX","DY"
nargs = args[2:]
fc = lambda x:numpy.float32(x)
for i,key in enumerate(keys):
ckey,_ = source_mod.get_global(key)
cuda.memcpy_htod(ckey,fc(nargs[i]))
ckey,_ = source_mod.get_global("SCHEME")
cuda.memcpy_htod(ckey,numpy.intc(scheme))
#set_globals in example.py
def set_globals(*args,source_mod=None):
"""Use this function to set cpu global variables"""
global dtdx,dtdy,gamma,gM1
t0,tf,dt,dx,dy,gamma = args
dtdx = dtdy = dt/dx
gM1 = gamma-1
if source_mod is not None:
keys = "DT","DX","DY","GAMMA","GAM_M1","DTDX","DTDY"
nargs = args[2:]+(gamma-1,dt/dx,dt/dy)
fc = lambda x:numpy.float64(x)
for i,key in enumerate(keys):
ckey,_ = source_mod.get_global(key)
cuda.memcpy_htod(ckey,fc(nargs[i]))
The final function is a CUDA function for the GPU.
//code in example.cu
__device__ __constant__ double DX;
__device__ __constant__ double DY;
__device__ __constant__ double DT;
__device__ __constant__ int SCHEME; //Determine which scheme to use
__device__
void step(double * shared_state, int idx, int globalTimeStep)
{
if (SCHEME)
{
shared_state[idx+TIMES]=shared_state[idx]+1;
}
else
{
bool cond = ((globalTimeStep)%ITS==0); //True on complete steps not intermediate
int timeChange = cond ? 1 : 0; //If true rolls back a time step
int addition = cond ? 2 : 1; //If true uses 2 instead of 1 for intermediate step
shared_state[idx+TIMES]=shared_state[idx-TIMES*timeChange]+addition;
}
}
The main entry point for the code is a script paired with an input yaml file. Staying with the current example, example.yaml
could look like this
swept: True
filename: "example.hdf5"
verbose: True
blocksize: 8
share: 0.5
dtype: float64
globals:
- 0
- 10
- 0.1
- 0.1
- 0.1
intermediate_steps: 1
operating_points: 1
cpu: ./pysweep-git/pysweep/equations/example.py
gpu: ./pysweep-git/pysweep/equations/example.cu
The associated script portion would look like
filename = pysweep.equations.example.createInitialConditions(1,384,384)
yfile = os.path.join(path,"inputs")
yfile = os.path.join(yfile,"example.yaml")
testSolver = pysweep.Solver(filename,yfile)
testSolver.simulation=True
testSolver()
The output of calling the solver object will look something like this
Setting up processes...
Creating time step data...
Creating output file...
Creating shared memory arrays and process functions...
Cleaning up solver...
Running simulation...
Pysweep simulation properties:
simulation type: swept
number of processes: 2
gpu source: ./pysweep-git/pysweep/equations/example.cu
cpu source: /home/anthony-walker/nrg-swept-project/pysweep-git/pysweep/equations/example.py
output file: example.hdf5
verbose: True
Array shape: (103, 1, 384, 384)
dtype: float64
share: 0.5
blocksize: 8
stencil size: 3
intermediate time steps: 1
time data (t0,tf,dt): (0,10,0.1)
Cleaning up processes...
Done in 7.3623316287994385 seconds...
This code is setup to run two examples from the command line. Running pysweep -h
will give the full list of options. To run an example with MPI use something like
shell
mpiexec -n 2 pysweep -f heat -nx 16 -nt 10 -b 8 -s 1 --swept --verbose
This code is intended to implement the swept-time rule in two dimensions and is paired with a standard solver for comparison.