/scope

A benchmark framework for POWER and x86_64

Primary LanguageMathematicaApache License 2.0Apache-2.0

Scope

Scope may be downloaded from https://github.com/c3sr/scope/releases

master
Build Status

A benchmark framework developed by the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) in collaboration with the IMPACT group at the University of Illinois.

Primary maintainers:

Project Advisors:

  • Prof. Wen-mei Hwu (UofI)
  • Dr. Jinjun Xiong (IBM Research)

Various benchmark suites using Scope are under development:

  • Comm|Scope - CUDA/NUMA data transfer performance (Carl Pearson, UIUC)
  • NCCL|Scope - GPU collective communication performance (Sarah Hashash, UIUC)
  • Histo|Scope - CUDA histogram techniques (Carl Pearson, UIUC)
  • DDL|Scope - IBM Distributed Deep Learning Library benchmarks (Vandana Kulkarni, UIUC)
  • TCU|Scope - CUDA/TCU performance primitives (Abdul Dakkak, UIUC)
  • FrameworkLayer|Scope - Evaluation of neural network layers across frameworks (Cheng Li and Abdul Dakkak, UIUC)
  • CUDNN|Scope - Evaluation of neural network layers using CuDNN(Cheng Li and Abdul Dakkak, UIUC)
  • Misc|Scope - experimental or miscellaneous benchmarks

Quick Start

  • Install CMake 3.12+
  • clone, checkout the lastest release, update submodules to match, and build
git clone https://github.com/c3sr/scope.git --recursive
cd scope
git checkout v1.3.2                 # or the latest, `git tag --list`
git submodule update                # match benchmark versions
mkdir build && cd build
cmake .. -DENABLE_COMM=ON           # or other scopes
make -j`nproc`
./scope --benchmark_list_tests=true # list all scopes

Install CMake >= 3.12

User install of CMake 3.12 (preferred)

If your system has CMake < 3.12, we suggest installing CMake 3.12+ in the user's $HOME directory.

On x86-64, the following will download CMake 3.12.0 and install it in $HOME/software/cmake-3.12.0.

cd /tmp
wget https://cmake.org/files/v3.12/cmake-3.12.0-Linux-x86_64.sh
mkdir -p $HOME/software/cmake-3.12.0
sudo sh cmake-3.12.0-Linux-x86_64.sh --prefix=$HOME/software/cmake-3.12.0 --exclude-subdir

You will then need to add $HOME/software/cmake-3.12.0/bin to your path. For many linux users, you add this to your $HOME/.bashrc:

export PATH="$PATH:$HOME/software/cmake-3.12.0/bin"`

On ppc64le, you will need to download the CMake source from the CMake website and build it.

System install of CMake 3.12

If you don't already know how to do this before reading, this is probably not the right option for you. First, uninstall any existing system install of CMake. Then, follow the User install instructions above, but choose a system prefix for the installation.

Compile

To compile the project run the following commands (making sure nvcc is in your $PATH, which is typically at /usr/local/cuda/bin/nvcc)

git clone https://github.com/c3sr/scope.git --recursive
cd scope
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

The build system uses Hunter to download all dependencies. If you have trouble downloading dependencies, check to make sure Hunter/CMake can use SSL. Or you can forego Hunter entirely and provide your own dependencies.

You will need to enable the particular scopes that provide the benchmarks you want to run

Scope CMake Option
CuDNN -DENABLE_CUDNN=1
NCCL -DENABLE_NCCL=1
Comm -DENABLE_COMM=1
Example -DENABLE_EXAMPLE=1 (default)

if you get errors about nvcc not supporting your gcc compiler, then you may want to use

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_HOST_COMPILER=`which gcc-6` .. 

You can optionally choose your own CUDA archs that you would like to be compiled:

cmake -DNVCC_ARCH_FLAGS="2.0 2.1 2.0 2.1 3.0 3.2 3.5 3.7 5.0 5.2 5.3" ..

The accepted syntax is the same as the CUDA_SELECT_NVCC_ARCH_FLAGS syntax in the FindCUDA module.

You can disable or enable individual scopes

cmake -DENABLE_MISC=0 ...

The submodules should automatically be checked out. If not, try checking them out yourself:

git submodule update --init --recursive

or to update modules to the proper verions

git submodule update --recursive --remote

Available Benchmarks

The available benchmarks and descriptions are listed here. You can list all the benchmarks with

./scope --benchmark_list_tests=true

you can filter the benchmarks that are run with a regular expression passed to --benchmark_filter.

./scope --benchmark_filter=[regex]

for example

./scope --benchmark_filter=SGEMM

futher controls over the benchmarks are explained in the --help option

Run all the benchmarks

This is not generally recommended, as it will take quite some time.

./scope

The above will output to stdout something like

------------------------------------------------------------------------------
Benchmark                       Time           CPU Iterations UserCounters...
------------------------------------------------------------------------------
SGEMM/1000/1/1/-1/1             5 us          5 us     126475 K=1 M=1000 N=1 alpha=-1 beta=1
SGEMM/128/169/1728/1/0        539 us        534 us       1314 K=1.728k M=128 N=169 alpha=1 beta=0
SGEMM/128/729/1200/1/0       1042 us       1035 us        689 K=1.2k M=128 N=729 alpha=1 beta=0
SGEMM/192/169/1728/1/0        729 us        724 us        869 K=1.728k M=192 N=169 alpha=1 beta=0
SGEMM/256/169/1/1/1             9 us          9 us      75928 K=1 M=256 N=169 alpha=1 beta=1
SGEMM/256/729/1/1/1            35 us         35 us      20285 K=1 M=256 N=729 alpha=1 beta=1
SGEMM/384/169/1/1/1            18 us         18 us      45886 K=1 M=384 N=169 alpha=1 beta=1
SGEMM/384/169/2304/1/0       2475 us       2412 us        327 K=2.304k M=384 N=169 alpha=1 beta=0
SGEMM/50/1000/1/1/1            10 us         10 us      73312 K=1 M=50 N=1000 alpha=1 beta=1
SGEMM/50/1000/4096/1/0       6364 us       5803 us        100 K=4.096k M=50 N=1000 alpha=1 beta=0
SGEMM/50/4096/1/1/1            46 us         45 us      13491 K=1 M=50 N=4.096k alpha=1 beta=1
SGEMM/50/4096/4096/1/0      29223 us      26913 us         20 K=4.096k M=50 N=4.096k alpha=1 beta=0
SGEMM/50/4096/9216/1/0      55410 us      55181 us         10 K=9.216k M=50 N=4.096k alpha=1 beta=0
SGEMM/96/3025/1/1/1            55 us         51 us      14408 K=1 M=96 N=3.025k alpha=1 beta=1
SGEMM/96/3025/363/1/0        1313 us       1295 us        570 K=363 M=96 N=3.025k alpha=1 beta=0

Output as JSON using

./scope --benchmark_out_format=json --benchmark_out=test.json

or preferably

./scope --benchmark_out_format=json --benchmark_out=`hostname`.json

Repeat benchmark runs with

./scope --benchmark_repetitions=5

Plot Benchmark JSON files

Try the ScopePlot python package.

pip install scope_plot

On Minsky With PowerAI

cd build && rm -fr * && OpenBLAS=/opt/DL/openblas cmake -DCMAKE_BUILD_TYPE=Release .. -DOpenBLAS=/opt/DL/openblas

Disable CPU frequency scaling

If you see this error:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

you might want to disable the CPU frequency scaling while running the benchmark. On ubuntu, install

apt install linux-tools-$(uname -r)

then

sudo cpupower frequency-set --governor performance
./scope
sudo cpupower frequency-set --governor powersave

Run with Docker

Install nvidia-docker, then, list the available benchmarks.

nvidia-docker run  --rm raiproject/microbench:amd64-latest bench --benchmark_list_tests

You can run benchmarks in the following way (probably with the --benchmark_filter flag).

nvidia-docker run --privileged --rm -v `readlink -f .`:/data -u `id -u`:`id -g` raiproject/microbench:amd64-latest ./numa-separate-process.sh dgx bench /data/sync2
  • --privileged is needed to set the NUMA policy if NUMA benchmarks are to be run.
  • -v `readlink -f .`:/data maps the current directory into the container as /data.
  • --benchmark_out=/data/\`hostname`.json tells the bench binary to write the json output files to /data in the container, which is mapped to the current directory.
  • -u `id -u`:`id -g` tells docker to run as user id -u and group id -g, which is the current user and group. This means that files that docker produces will be modifiable from the host system without root permission.

Hunter Toolchain File

If some of the third-party code compiled by hunter needs a different compiler, you can create a cmake toolchain file to set various cmake variables that will be globally used when building that code. You can then pass this file into cmake

cmake -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake ...

Adding a new benchmark

If you would like to develop a benchmark suite, read here for more information. Also, check out the Example|Scope for a template to get started

Third-Party Resources