Medical-Compute-using-ChRIS-on-the-MOC-PowerPC-and-x86_64-GPU-usage-and-benchmarking

Introduction to ChRIS

Table of Content

1. Vision and Goals Of The Project
2. Users/Personas Of The Project
3. Scope and Features Of The Project
4. Solution Concept
5. Acceptance criteria
6. Release Planning

Final Project Demonstration

Contributors

Mentor: Rudolph Pienaar (rudolphpienaar)

Group Members:


1. Vision and Goals Of The Project

The overall vision of the ChRIS project is to develop a plugin based on ChRIS platform so that users like developers or administrators are able to do benchmarking on cloud network topologies that include different architectures like x86 and PowerPC.

We are going to get familiar with Mass Open Cloud, ChRIS platform, ChRIS plugins and benchmarking methods in order to integrate all the components.

Our benchmarking plug-in will be the first ChRIS plug-in that can test performance of the ChRIS platform.

High-level goals include:

  • Improve the functions of our benchmarking ChRIS plugin

  • Use the real work environment to further improve the ChRIS plugin to better benchmark

2. Users/Personas Of The Project

  • User Persona Examples:

As a ChRIS developer / administrator, I would like to have a way to test how my plugin performs on different architectures such as x86 vs PowerPC therefore I want a ChRIS plugin that performs benchmarking tests on these architectures.

  • Non-target users are:

Clinicians / Technicians / Patients who may use ChRIS platform but don't do or care about benchmarking between different architectures.

3. Scope and Features Of The Project

We will focus on one plug-in which provides a series of tools and test functions to test the performance of the system. Based on what we decide is feasible, the test functions may range from a simple matrix multiplying to huge neural network training. These test functions will represent real workloads that may be deployed on the system. For example, if the real functions move data between main memory and GPU memory frequently, our functions are supposed to show this feature.

However, we are not focusing on building a precious and complex machine learning model or data processing method. All test functions will run fast and estimate the time that may be spent on running real computing tasks. These tests will run in an acceptable time span, like several minutes. Therefore, they should emulate the real ChRIS workloads as light as possible. Since there is no reason to run a benchmarking task for 8 hours rather than run a real task for 8 hours, we will ESTIMATE the performance.

At last, this plugin will produce comparable results that allow users to compare the performance of different platforms in an elegant and easy method.

4. Solution Concept

Global Architectural Structure Of the Project:

This section provides a high-level architecture or a conceptual diagram showing the scope of the solution. If wireframes or visuals have already been done, this section could also be used to show how the intended solution will look. This section also provides a walkthrough explanation of the architectural structure.

5. Acceptance criteria

Correctly developed a runnable ChRIS plugin that to some extent presents the performance differences between different platform architectures. The minimal product presents the benchmark differences between x86 and PowerPC.

6. Release Planning

The release planning section describes how the project will deliver incremental sets of features and functions in a series of releases to completion. Identification of user stories associated with iterations that will ease/guide sprint planning sessions is encouraged. Higher-level details for the first iteration is expected.

Get familiar with the ChRIS platform, either from web-app/ terminal operations.

Set up environments for future development. (Linux/ docker)

Research on how to use MOC(Mass Open Cloud)

Build a simple benchmarking programm and run it locally, e.g. Matrix-multiplying.

Research on a more complex benchmarking programm, e.g. 'Real-Time Object Detection on GPU'.

Be able to run operations on the MOC computers

Be able to run plugins from the local ChRIS instance

Be able to run a pre-existent plugin via ChRIS on the MOC GPUs.

Develop benchmarking metrics to analyze plugin processes.

Integrate our plugin into the ChRIS platform.

Get more granular with benchmarking metrics


Our team gave a class talk on Spark, which is a distributed computing framework designed for applications.


Design Implications and Discussion:

The goal for the ChRIS platform is to provide a containerized application that is made up of many plugins which run specific functions on inputs. The scope for our portion of the project is to develop one plugin that runs a function to benchmark performance between different architectures. The reason for this design is to make the plugin easy to use and integrate with a ChRIS developers workflow.

The implications for our global architecture design are to allow ChRIS developers to use ChRIS to benchmark different architectures to find which one maximize their ChRIS plugin performance.


Product Documentation

I. Matrix Multiplication Plugin

Docker Images

Source code

Description


                                 _        _                      _ _   _       _       
                                | |      (_)                    | | | (_)     | |      
                 _ __ ___   __ _| |_ _ __ ___  ___ __ ___  _   _| | |_ _ _ __ | |_   _ 
                | '_ ` _ \ / _` | __| '__| \ \/ / '_ ` _ \| | | | | __| | '_ \| | | | |
                | | | | | | (_| | |_| |  | |>  <| | | | | | |_| | | |_| | |_) | | |_| |
                |_| |_| |_|\__,_|\__|_|  |_/_/\_\_| |_| |_|\__,_|_|\__|_| .__/|_|\__, |
                                                                        | |       __/ |
                                                                        |_|      |___/ 

This is a benchmarking plugin for ChRIS platform of Boston Children's Hospital (What is ChRIS?)on both x86_64 and PowerPC MOC using matrix multiplication.

Plugin Info Content Description
Input matrix parameters COE value to indicate matrix size, given by command line
Output one .csv file Contains matrix size and running time for multiplication task

Usage

Requirements

Your host computer should be a linux os and installed CUDA 10.1 && nvidia container.

Docker run on x86_64

First pull docker image to local environment:

docker pull fnndsc/pl-matrixmultiply_moc_x86_64

Then you can run it with parameters:

docker run --runtime=nvidia                                         \
            -e NVIDIA_VISIBLE_DEVICES=1                             \
            -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing          \
            fnndsc/pl-matrixmultiply                                \
            matmultiply.py                                          \
            -c 32,32,128                                            \
            /incoming /outgoing

Parameters and meaning below in the table:

docker run parameters for x86_64
parameters function example
--runtime=nvidia tells the docker to use the nvidia docker --runtime=nvidia
-e specifies the visible graphic device id -e NVIDIA_VISIBLE_DEVICES=1
-v specify input and outgoing folder, check docker volume bind -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing
image_name specify docker image name fnndsc/pl-matrixmultiply
script_name specify script file to run matmultiply.py
-c specify COE value to define matrix size, format like: start_value, gap_value, end_value -c 32,32,128

Docker run on PowerPC

First pull docker image to local environment:

docker pull fnndsc/pl-matrixmultiply_moc_ppc64

Then you can run it with parameters:

docker run  --security-opt label=type:nvidia_container_t            \
            -v $(pwd):/incoming:z -v $(pwd)/out:/outgoing:z         \
            fnndsc/pl-matrixmultiply_moc_ppc64                      \
            matmultiply.py                                          \
            -c 32,32,128                                            \
            /incoming /outgoing

Parameters and meaning below in the table:

docker run parameters for PowerPC
parameters function example
--security-opt label=type:nvidia_container_t tells the docker to use the nvidia docker --security-opt label=type:nvidia_container_t
-v specify input and outgoing folder, check docker volume bind -v $(pwd):/incoming:z -v $(pwd)/out:/outgoing:z
image_name specify docker image name fnndsc/pl-matrixmultiply_moc_ppc64
script_name specify script file to run matmultiply.py
-c specify COE value to define matrix size, format like: start_value, gap_value, end_value -c 32,32,128

Build Instructions

For ppc64le image, we cannot use the automatic build on docker hub. We have to build this conatiner locally and push it into docker hub.

cd path/to/this/repo
docker login docker.io -u [your docker.io username]
docker build -f Dockerfile -t docker.io/fnndsc/pl-matrixmultiply_moc_ppc64 
docker push "docker.io/fnndsc/pl-matrixmultiply_moc_ppc64"

Example

x86_64

docker run --runtime=nvidia                                         \
            -e NVIDIA_VISIBLE_DEVICES=1                             \
            -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing          \
            fnndsc/pl-matrixmultiply                                \
            matmultiply.py                                          \
            -c 32,32,128                                            \
            /incoming /outgoing

PowerPC

docker run  --security-opt label=type:nvidia_container_t            \
            -v $(pwd):/incoming:z -v $(pwd)/out:/outgoing:z         \
            fnndsc/pl-matrixmultiply_moc_ppc64                      \
            matmultiply.py                                          \
            -c 32,32,128                                            \
            /incoming /outgoing

Research and Development References

Workflow

What this plugin simply does is, when assigned with the COE parameter (-c), it will generate a list of COE values, for example, -c 32,32,128 will generate four parameters of COE as: [32, 64, 96, 128], and it is not necessary to assign the end value as exactly divisible by gap value. (which means you can go with -c 32,32,100.

For each COE value in this list, it will generate square matrix with size to be (COE x TPB) ^2, where TPB value is determined as 32 in the program, and therefore you have different sizes of matrix to do multiplication.

The program will record this running time along with start time, end time and matrix size, and generate a .csv file in your output folder, that's what you can use for benchmarking purpose.

Troubleshoot

Make sure your /out directory has corresponding authorization, you can try:

mkdir in out && chmod 777 out                                       

Related Links

https://medium.com/datathings/benchmarking-blas-libraries-b57fb1c6dc7

II. Object Detection Plugin

Docker image

Source code

Description

                  ___  _     _           _     ____       _            _   _             
                 / _ \| |__ (_) ___  ___| |_  |  _ \  ___| |_ ___  ___| |_(_) ___  _ __  
                | | | | '_ \| |/ _ \/ __| __| | | | |/ _ \ __/ _ \/ __| __| |/ _ \| '_ \ 
                | |_| | |_) | |  __/ (__| |_  | |_| |  __/ ||  __/ (__| |_| | (_) | | | |
                 \___/|_.__// |\___|\___|\__| |____/ \___|\__\___|\___|\__|_|\___/|_| |_|
                          |__/                                                           


This is a GPU benchmarking plugin for ChRIS platform of Boston Children's Hospital (What is ChRIS?)on both x86_64 and PowerPC MOC using object detection.

Plugin Info Content Description
Input one video file Target file to be tested with object detection
Output one .csv file Contains test results maximum frame per second, minimum frame per second and average frame per second

Usage

Requirements

Your host computer should be a linux os and installed CUDA 10.1 && nvidia container.

Main Dependencies: ffmpeg opencv-python tensorflow tensorrt

Docker run on x86_64

First pull docker image to local environment:

docker pull docker.io/fnndsc/pl-objectdetection_x86

Then you can run it with parameters:

docker run --runtime=nvidia                                         \
            -e NVIDIA_VISIBLE_DEVICES=1                             \
            -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing          \
            docker.io/fnndsc/pl-objectdetection_x86                 \
            objectdetection.py                                      \
            -f animal360p.webm                                      \
            /incoming /outgoing

Parameters and meaning below in the table:

docker run parameters for x86_64
parameters function example
--runtime=nvidia tells the docker to use the nvidia docker --runtime=nvidia
-e specifies the visible graphic device id -e NVIDIA_VISIBLE_DEVICES=1
-v specify input and outgoing folder, check docker volume bind -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing
image_name specify docker image name docker.io/fnndsc/pl-objectdetection_x86
script_name specify script file to run objectdetection.py
-f, --file specify input file for object detection in input folder -f animal360p.webm

Docker run on PowerPC

First pull docker image to local environment:

docker pull docker.io/fnndsc/docker.io/fnndsc/pl-objectdetection_moc_ppc64

Then you can run it with parameters:

docker run --security-opt label=type:nvidia_container_t     \
           -v $(pwd):/incoming:z -v $(pwd)/out:/outgoing:z  \
           docker.io/fnndsc/pl-objectdetection_moc_ppc64     \
           objectdetection.py                               \
           -f animal360p.webm                               \
           /incoming /outgoing

Parameters and meaning below in the table:

docker run parameters for PowerPC
parameters function example
--security-opt label=type:nvidia_container_t tells the docker to use the nvidia docker --security-opt label=type:nvidia_container_t
-v specify input and outgoing folder, check docker volume bind -v $(pwd):/incoming:z -v $(pwd)/out:/outgoing:z
image_name specify docker image name docker.io/fnndsc/pl-matrixmultiply_moc_ppc64
script_name specify script file to run objectdetection.py
-f, --file specify input file for object detection in input folder -f animal360p.webm

Build Instructions

For ppc64le image, we cannot use the automatic build on docker hub. We have to build this conatiner locally and push it into docker hub.

cd path/to/this/repo
docker login docker.io -u [your docker.io username]
docker build -f Dockerfile -t docker.io/fnndsc/pl-objectdetection_moc_ppc64 .
docker push "docker.io/fnndsc/pl-objectdetection_moc_ppc64"

Example

For both x86_64 and PowerPC, please check FNNDSC/objectdetection_example repo for example usage:objectdetection_example

Research and Development References

Workflow

workflow

This graph show the workflow of the original python script (provided by nVidia). One of the difference in our scripts is that we use a file instead of the web camera as graph source. Therefore, this container have to use the ffmpeg to decode the video file. Also, since there is no graphic interface in the server/moc/openshift, we removed the realtime progress showing codes and replace it by saving the output to an output file (output.avi) in the outgoing directiory. This means ffmpeg is essentical.

There is another output file called FramePerSecondRecord.csv. This file contains the benchmarking results of the plugin. The output should be like this:

maximum_fps minimum_fps average_fps
250.0 142.86 239.92

If you wanna more research details of this project, check this tutorial.

(If you run it multiple times , the newest result will be added to the last line of file.)

(Results from a ppc64le machine)

This shows the information about the inference time for every frame. We think it shows the data bus latency from cpu/main memory to the GPU.

Benchmarking result

On ppc64le machine, the typical inference time for each frame is about 4 ms. However in x86_64 machine, we got about 6~7 ms inference time for every frame. We think the difference is significant (powerpc is about 40% faster than x86_64).

Troubleshoot

Error opening video stream or file

This means the opencv didn't open the video file successfully. Check:

  • If the file exist
  • If the input video coding format is supported by curent version ffmpeg.

Failed to establish a new connection

Please contact the machine administrator to ensure the docker has the internet access ability.

Related Links

Most python scripts in this repo is forked from this tensorRT example provided by Nvidia:

https://github.com/NVIDIA/object-detection-tensorrt-example

ChRIS Workflow Documentation

ChRIS Plugin Workflow on Titan

ChRIS Plugin Workflow on Titan

Running pman and pfioh on Power9 Cluster in the Mass Open Cloud

pman ppc64le pfioh ppc64le

If you want to pull the docker container

docker pull docker.io/emslade/pman.ppc64le
docker pull docker.io/emslade/pfioh.ppc64le

Step 1: Log in to Power9 Openshift

Step 2: Navigate to your project. Here you will see a "Add to Project" option in the top right hand corner.

Step 3: Click on Add to Project -> Deploy Image

Step 4: Now, check on Image Name.

  • if you want to deploy pfioh, include emslade/pfioh.ppc64le in the text box
  • if you want to deploy pman, include emslade/pman.ppc64le in the text box

Step 5: Then, click Deploy !

Step 6: After pman and pfioh are deployed, you can say hello to these services.