/ds_dmh

Primary LanguageRoffBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Contents of this README file:
 * About DataSpaces
 * Building and Installing DataSpaces
 * DataSpaces APIs
 * Test Applications
 * FAQ


About DataSpaces
=================

This package contains the source code and two test applications for
DataSpaces run-time library, which is a flexible interaction and
coordination framework to support coupled applications.

DataSpaces is a distributed framework built on a dynamic set of nodes
of a HPC cluster, which implements a virtual shared-space abstraction
that can be accessed concurrently by all applications in a coupled
simulation workflow. The coupled applications can coordinate and
interact using DataSpaces by inserting and fetching scientific data
of interest through the distributed in-memory space.

DataSpaces provides a simple, yet powerful interface similar to the
tuple-space model, e.g., dspaces_put()/dspaces_get() to insert/retrieve data
objects into/from the virtual space.


Building and Installing DataSpaces
==================================

Please read the INSTALL file.


DataSpaces APIs
===============

Please read header files include/dataspaces.h and include/dimes_interface.h for
detailed API documentation. 


Test Applications
=================

We include two set of test code in the package. They are under tests directory.
The test is a simple workflow composed of two applications (a writer and a
reader), which interact through DataSpaces at runtime. The workflow works as
follow: the writer processes generates array of floating points number, obtain
the write lock, write it to the staging area, then release the write lock. The
reader processes first obtain the read lock, read data from the staging area, 
and release the read lock. This simple write/read data interaction can repeat
for a number of iterations defined by user.

The writer/reader programs are written in C and Fortran and are located in
C and Fortran directory respectively. The DataSpaces staging server code is
written in C and in C directory.

The test workflow include the following main source files:

        'tests/C/dataspaces_server.c': DataSpaces staging server
        'tests/C/test_writer.c' : writer application
        'tests/C/test_reader.c' : reader application
        'tests/Fortran/test_put.F90' : writer application
        'tests/Fortran/test_get.F90' : reader application


1. Building the example
------------------------

Under the tests directory (tests/C or tests/Fortran execute

        $ make

to create the following binary executables
'/tests/C/dataspaces_server',  '/tests/C/test_reader', 'tests/C/test_writer',
'tests/Fortran/test_put' and '/tests/Fortran/test_get'. Before running Fortran
test code, user needs to copy the staging server executable (dataspaces_server
) to Fortran directory. 


2. Preparing the configuration file for DataSpaces
---------------------------------------------------

User need to create a configuration file dataspaces.conf under C or Fortran
test directory. dataspaces.conf file should be edited according to the specific
requirements of the applications. A sample configuration file is included under
scripts/dataspaces.conf.

Please read the FAQ section on how to set the configuration parameters for
dataspaces.conf file.


3. How to run test workflow
---------------------------

In the scripts/fortran_test and scripts/c_test directory, we provide a number 
of PBS job script examples. These files should be modified according to
the specific workflow scenario and the system you plan to run DataSpaces.

The DataSpaces server executable has three command line options:

  --server, -s    number of server instance/process
  --cnodes, -c    number of client processes
  --conf, -f      path to the configuration file

In the test/Fortran directory, the test application executables are 'test_put'
and 'test_get'. Running the executables requires 8 arguments as arg1-arg8:

  arg1 : number of total application processes
  arg2 : number of dimensions for application data domain
  arg3 : number of processes in x direction (of the process layout)
  arg4 : number of processes in y direction (of the process layout)
  arg5 : number of processes in z direction (of the process layout)
  arg6 : block size per process in x direction (of the app data domain)
  arg7 : block size per process in y direction (of the app data domain)
  arg8 : block size per process in z direction (of the app data domain)
  arg9 : number of iterations

Here is an example:
  dataspaces_server -s 4 -c 72 
  test_put 64 3 4 4 4 256 256 256 50 
  test_get 8 3 4 2 1 256 512 1024 50 

In the test/C directory, the test application executables are 'test_writer'
and 'test_reader'. Running the executables requires the following arguments:
  test_writer transfer_method npapp ndim np_1 ... np_ndim sp_1 .. sp_ndim timestep appid elem_size num_vars
  test_reader transfer_method npapp ndim np_1 ... np_ndim sp_1 .. sp_ndim timestep appid elem_size num_vars

  transfer_method: data transfer method (DATASPACES or DIMES).
  npapp: number of total application processes.
  ndim: number of dimensions for application data domain.
  np_1~np_ndim: number of application processes in dimension 1~ndim (of the process layout).
  sp_1~sp_ndim: block size per process in dimension 1~ndim (of the app data domain).
  timestep: number of iteration.
  appid: application id.
  elem_size: Optional. Size of one element for the application array. Default value is 8 (bytes).
  num_vars: Optional. Number of array variables to be transferred between writer and reader application. Default value is 1.

Here is an example:
  dataspaces_server -s 4 -c 72
  test_writer DATASPACES 64 3 4 4 4 256 256 256 50 1
  test_reader DATASPACES 8 3 4 2 1 256 512 1024 50 2

Both Fortran and C tests are using the same dataspaces_server executable written in C.

Note: if you would like to write your own job script, please add command rm -f conf
srv.lck to your script in order to remove config files from previous run.


FAQ
===

1. How to set configure parameters for dataspaces.conf
------------------------------------------------------

Running DataSpaces requires users to create a customized dataspaces.conf file.
The configuration file has the following parameters:

(1) ndim: number of dimensions for the default application data domain.
(2) dims: size of each dimension for the default application data domain.

DataSpaces needs the global dimension information of application data to build
the distributed indexing. Users can use the above two configuration parameters
to specify a default global dimension. For example, in a workflow where
applications exchange multiple 2D arrays with the same global dimension as
256x128, the parameters can be set as below.
ndim = 2
dims = 256,128

Note: For large values of ndim and dims, initialization of dataspaces_server will take significantly
more time. The only way to make sure dataspaces_clients to wait enough for the dataspaces_server 
initialization is to run the server and have an approximate estimate on the time clients should wait
for the server.

Since DataSpaces 1.4.0 release, user application can explicitly set the global
dimension for each variable using dspaces_define_gdim() and dimes_define_gdim()
APIs (see include/dataspaces.h and include/dimes_interface.h for detailed
documentation). In this case, the default global dimension is only used for
application variables whose global dimension is not explicitly defined. 

(3) max_versions: maximum number of versions of a data object to be cached
in DataSpaces servers.

DataSpaces servers can cache multiple versions of a data object in memory.
Specifically, the servers cache most recent 'max_version' versions of a data
object. When newer version is inserted into DataSpaces, older version of the
data object is freed in a FIFO manner. Version information is explicitly defined
by user application when calling dspaces_put() API.

Note: in current implementation, 'max_version' is only useful for DataSpaces
transport method. The value of 'max_version' does not affect DIMES transport
method.

(4) max_readers: maximum number of reader applications in the workflow.

(5) lock_type: type of DataSpaces locking service. Valid values: 1 - generic, 2 - custom.

Lock type 1 implements the exclusive lock. Application can acquire the lock by
calling dspaces_lock_on_write()/dspaces_lock_on_read(), as long as the lock is
not acquired by another application. 

Lock type 2 implements a customized lock that enforces writer/reader
synchronization. Writer application can always acquire the lock when calling
dspaces_lock_on_write() for the first time. In the subsequent calls to
dspaces_lock_on_write(), acquiring the lock requires all the reader applications
first call dspaces_unlock_on_read(). For reader applications, acquiring the lock
requires the writer application first call dspaces_unlock_on_write().

For example, the writer application calls the lock/unlock functions in the
following order:
    // Writer
    dspaces_lock_on_write("var1", NULL);
    ...
    dspaces_unlock_on_write("var1", NULL);

    dspaces_lock_on_write("var1", NULL);
    ...
    dspaces_unlock_on_write("var1", NULL);

And the reader application calls the lock/unlock functions in the following
order:
    // Reader
    dspaces_lock_on_read("var1", NULL);
    ...
    dspaces_unlock_on_read("var1", NULL);

Lock type 2 would ensure the applications acquire the lock in the following
order:
    writer acquires lock for "var1";
    reader acquires lock for "var1";
    writer acquires lock for "var1";


2. Set the number of DataSpaces servers
---------------------------------------

Running DataSpaces requires users to specify the number of servers.

(1) Set the number of servers for DataSpaces transport method:
When DataSpaces transport method (e.g. dspaces_put()/dspaces_get()) is used,
application data is cached in the memory of DataSpaces server processes.
Two major factors need to be considered when choosing the number
of servers.

First, estimated total size of application data. Because all the application data
is cached in the DRAM of server processes, total data size would have to fit in
severs' memory aggregation. For example, the workflow applications write a total of 16GB
data in a single time step, and the max_version is set as 10. DataSpaces servers
need to cache an estimated size of 160GB data in memory. Assuming we run a single
DataSpaces server on one physical compute node (of HPC cluster), and each compute
nodes has about 8GB DRAM. As a result, we need to execute at least 10 DataSpaces
servers for this workflow.
 
Second, total number of application processes in the workflow. During the
execution of dspaces_put()/dspaces_get(), it involves a lot of messaging and data
communications between application processes and DataSpaces servers. As a
result, when the number of application processes is increased, it is recommended
to also increase the number of DataSpaces servers, which would help reduce the
messaging and data communications load on individual server. For example, if you
run 8 servers for 1024 application processes, it is better to run 128 servers
for 16384 application processes instead of keeping the number of servers
unchanged even if the total data size is fixed.

(2) Set the number of servers for DIMES transport method:
When DIMES transport methods (e.g. dimes_put()/dimes_get()) is used, application
data is directly cached in the application memory. DataSpaces server is used to
index the cached data objects, and locate data objects for application's
dimes_get() queries. In this case, the total size of application data does not
affect the number of servers. However, because the application processes still
need to interact with the servers, the second factor (above) still needs to be
considered.

For more information on installing DataSpaces and running the examples, or
if you have any other questions, please visit the following web page:
http://personal.cac.rutgers.edu/TASSL/projects/data/support.html