GEMBench (Gpu-accelerated Embedded Mdp testBENCH)

For usage questions, or to contribute solvers/parsers/etc, contact Adrian Sapio: asapio@umd.edu

Background:

This software package contains MDP datasets, parsers, solvers and reference solutions. It was developed on an NVIDIA Jetson embedded GPU development board. It is intended to be portable to other CUDA-capable Linux platforms, but has currently only been tested on the Jetson TX-1.

The following paper details the motivation behind this software and also contains a detailed description of it:

A. Sapio, R. Tatiefo, S. Bhattacharyya, and M. Wolf. GEMBench: A platform for collaborative development of GPU accelerated embedded Markov decision systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, July 2019

Software organization:

The provided files are organized in the following way:

gembench/scripts/ <-- Automation scripts (e.g. to solve all of the MDPs in an entire dataset, etc).

gembench/src/ <-- Source code for executables that need to be compiled

gembench/src/main.cpp <-- The main() function for the "gembench" executable

gembench/src/solvers/* <-- Source code for multiple MDP solvers

gembench/src/parsers/format1 <-- Source code for routines that parse MDPs stored in format 1 of N gembench/src/parsers/format2 ... gembench/src/parsers/formatN

gembench/datasets/ <-- Example MDPs that can be read in by an appropriate parser and then solved by a solver

gembench/datasets/dataset1 <-- Files containing MDPs from dataset 1 of N gembench/datasets/dataset2 ... gembench/datasets/datasetN

gembench/datasets/datasetA/solutions/solverB <-- Reference solutions generated by running solver B on dataset A

Setup:

Obtain an NVIDIA Jetson development board, such as the TX-1: https://elinux.org/Jetson_TX1
Install the Linux4Tegra (L4T) distribution on it. See: https://developer.nvidia.com/embedded/jetpack
Use the Jetpack installer to install the CUDA Toolkit on the board.
(On the Jetson board) verify the CUDA compiler driver installed correctly:

ubuntu@tegra-ubuntu:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Nov_19_03:16:56_CST_2017
Cuda compilation tools, release 9.0, V9.0.252

Clone this GEMBench software package on to the Jetson board.
(On the Jetson board) Install CMake (needed to compile some GEMBench programs), using "sudo apt-get install cmake"
(On the Jetson board) Build the GEMBench program using the following commands:

cd gembench  
cd src  
mkdir build  
cd build/  
cmake ../  
make

At this point an executable called "gembench" should exist in the build directory.
run gembench --help (or with no arguments) to see the command line usage instructions

MDP Datasets

In the current release, some MDP datasets that can be downloaded directly from the web are not packaged with GEMBench. A python script has been created that downloads those MDP files directly from the Jetson board:

from gembench/datasets run "python download_datasets.py"

Solve a single MDP
from gembench/src/build

gembench -m /path/to/my/foo.pomdp -s solver_name

To solve all of the MDPs in a dataset, run one of the bash scripts in gembench/scripts

Power Measurement:

See: https://elinux.org/Jetson/TX1_Power_Monitor

Version Info

GEMBench was tested on a Jetson TX-1 running L4T 28.2 installed from Jetpack 3.3 with the following CUDA packages installed:

ubuntu@tegra-ubuntu:/usr/local$ dpkg-query -l | grep cuda
ii  cuda-command-line-tools-9-0                9.0.252-1                                     arm64        CUDA command-line tools  
ii  cuda-core-9-0                              9.0.252-1                                     arm64        CUDA core tools  
ii  cuda-cublas-9-0                            9.0.252-1                                     arm64        CUBLAS native runtime libraries  
ii  cuda-cublas-dev-9-0                        9.0.252-1                                     arm64        CUBLAS native dev links, headers  
ii  cuda-cudart-9-0                            9.0.252-1                                     arm64        CUDA Runtime native Libraries  
ii  cuda-cudart-dev-9-0                        9.0.252-1                                     arm64        CUDA Runtime native dev links, headers  
ii  cuda-cufft-9-0                             9.0.252-1                                     arm64        CUFFT native runtime libraries  
ii  cuda-cufft-dev-9-0                         9.0.252-1                                     arm64        CUFFT native dev links, headers  
ii  cuda-curand-9-0                            9.0.252-1                                     arm64        CURAND native runtime libraries  
ii  cuda-curand-dev-9-0                        9.0.252-1                                     arm64        CURAND native dev links, headers  
ii  cuda-cusolver-9-0                          9.0.252-1                                     arm64        CUDA solver native runtime libraries  
ii  cuda-cusolver-dev-9-0                      9.0.252-1                                     arm64        CUDA solver native dev links, headers  
ii  cuda-cusparse-9-0                          9.0.252-1                                     arm64        CUSPARSE native runtime libraries  
ii  cuda-cusparse-dev-9-0                      9.0.252-1                                     arm64        CUSPARSE native dev links, headers  
ii  cuda-documentation-9-0                     9.0.252-1                                     arm64        CUDA documentation  
ii  cuda-driver-dev-9-0                        9.0.252-1                                     arm64        CUDA Driver native dev stub library  
ii  cuda-libraries-dev-9-0                     9.0.252-1                                     arm64        CUDA Libraries 9.0 development meta-package  
ii  cuda-license-9-0                           9.0.252-1                                     arm64        CUDA licenses  
ii  cuda-misc-headers-9-0                      9.0.252-1                                     arm64        CUDA miscellaneous headers  
ii  cuda-npp-9-0                               9.0.252-1                                     arm64        NPP native runtime libraries  
ii  cuda-npp-dev-9-0                           9.0.252-1                                     arm64        NPP native dev links, headers  
ii  cuda-nvgraph-9-0                           9.0.252-1                                     arm64        NVGRAPH native runtime libraries  
ii  cuda-nvgraph-dev-9-0                       9.0.252-1                                     arm64        NVGRAPH native dev links, headers  
ii  cuda-nvml-dev-9-0                          9.0.252-1                                     arm64        NVML native dev links, headers  
ii  cuda-nvrtc-9-0                             9.0.252-1                                     arm64        NVRTC native runtime libraries  
ii  cuda-nvrtc-dev-9-0                         9.0.252-1                                     arm64        NVRTC native dev links, headers  
ii  cuda-repo-l4t-9-0-local                    9.0.252-1                                     arm64        cuda repository configuration files  
ii  cuda-samples-9-0                           9.0.252-1                                     arm64        CUDA example applications  
ii  cuda-toolkit-9-0                           9.0.252-1                                     arm64        CUDA Toolkit 9.0 meta-package

When built, the "gembench" executable was linked to the following shared libraries:

gembench/src/build$ ldd gembench 
        linux-vdso.so.1 =>  (0x0000007fb49b6000)  
        libcusparse.so.9.0 => /usr/local/cuda-9.0/lib64/libcusparse.so.9.0 (0x0000007fb17af000)  
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007fb1768000)  
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007fb1755000)  
        librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007fb173d000)  
        libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007fb15ae000)  
        libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007fb158d000)  
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007fb1445000)  
        /lib/ld-linux-aarch64.so.1 (0x0000005588941000)  
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007fb1398000)

The cuda_init() function in the source code prints the following when verbose mode is enabled:

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"  
  CUDA Driver Version / Runtime Version          9.0 / 9.0  
  CUDA Capability Major/Minor version number:    5.3  
  Total amount of global memory:                 3984 MBytes (4177342464 bytes)  
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores  
  GPU Clock rate:                                998 MHz (1.00 GHz)  
  Memory Clock rate:                             13 Mhz  
  Memory Bus Width:                              64-bit  
  L2 Cache Size:                                 262144 bytes  
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)  
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers  
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers  
  Total amount of constant memory:               65536 bytes  
  Total amount of shared memory per block:       49152 bytes  
  Total number of registers available per block: 32768  
  Warp size:                                     32  
  Maximum number of threads per multiprocessor:  2048  
  Maximum number of threads per block:           1024  
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)  
  Maximum memory pitch:                          2147483647 bytes  
  Texture alignment:                             512 bytes  
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)  
  Run time limit on kernels:                     Yes  
  Integrated GPU sharing Host Memory:            Yes  
  Support host page-locked memory mapping:       Yes  
  Alignment requirement for Surfaces:            Yes  
  Device has ECC support:                        Disabled  
  Device supports Unified Addressing (UVA):      Yes  
  Device PCI Bus ID / PCI location ID:           0 / 0  
  Compute Mode:  
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >