/gpu-camera-sample

Realtime GPU processing software (Windows, Linux, ARM) for machine vision camera applications. Performance benchmarks and Glass-to-Glass time measurements.

Primary LanguageC++OtherNOASSERTION

gpu-camera-sample

Camera sample application with realtime GPU image processing (Windows, Linux, Jetson)

gpu software machine vision genicam

That software is based on the following image processing pipeline for camera applications that includes:

  • Raw image capture (monochrome or bayer 8-bit, 12-bit packed/unpacked, 16-bit)
  • Import to GPU
  • Raw data conversion and unpacking
  • Linearization curve
  • Bad Pixel Correction
  • Dark frame subtraction
  • Flat-Field Correction
  • White Balance
  • Exposure Correction (brightness control)
  • Debayer with HQLI (5×5 window), L7 (7×7 window), DFPD (11×11), MG (23×23) algorithms
  • Color correction with 3×3 matrix
  • Wavelet-based denoiser
  • Crop / Resize / Flip / Flop / Rotate
  • Gamma (linear, sRGB)
  • JPEG / MJPEG encoding/decoding
  • H.264 and HEVC encoding/decoding
  • Output to monitor via OpenGL
  • Export from GPU to CPU memory
  • MJPEG and H.264/H.265 streaming
  • Storage of compressed images/video to SSD

Processing is done on NVIDIA GPU to speedup the performance. The software could also work with raw bayer images in PGM format and you can utilize these images for testing or if you don't have a camera or if your camera is not supported. More info about that project you can find here.

From the benchmarks on NVIDIA Quadro RTX 6000 or GeForce RTX 2080ti we can see that GPU-based raw image processing is very fast and it could offer high image quality at the same time. The total performance could reach 4 GPix/s for color cameras. The performance strongly depends on complexity of the pipeline. Multiple GPU solutions could significantly improve the performance.

Currently the software is working with XIMEA cameras via XIMEA SDK. FLIR cameras are supported via Spinnaker SDK. We can work with Imperx cameras via Imperx SDK.

Via GenICam the software can work with XIMEA, MATRIX VISION, Basler, FLIR, Imperx, JAI, Daheng Imaging cameras.

Soon we are going to add support for Emergent Vision Technologies, IDS Imaging Development Systems, Baumer, Kaya Instruments cameras. You can add support for desired cameras by yourself. The software is working with demo version of Fastvideo SDK, that is why you can see a watermark on the screen. To get a Fastvideo SDK license for development and for deployment, please contact Fastvideo company.

How to build gpu-camera-sample

Requirements for Windows

  • Camera SDK or GenICam package + camera vendor GenTL producer (.cti). Сurrently XIMEA, MATRIX VISION, Basler, FLIR, Imperx, JAI, Daheng Imaging cameras are supported
  • Fastvideo SDK (demo) ver.0.16.3.0
  • NVIDIA CUDA-10.2
  • Qt ver.5.13.1
  • Compiler MSVC 2017 or later

Requirements for Linux

  • Ubuntu 18.04 (x64 or Arm64)
  • Camera SDK or GenICam package + camera vendor GenTL producer (.cti). Currently XIMEA, MATRIX VISION, Basler, FLIR, Imperx, JAI, Daheng Imaging cameras are supported
  • Fastvideo SDK (demo) ver.0.16.0.0
  • NVIDIA CUDA-10.2 for x64 and ARM64 platform
  • Compiler gcc 7.4 or later
  • Qt 5 (qtbase5-dev)
sudo apt-get install qtbase5-dev qtbase5-dev-tools qtcreator git
  • FFmpeg libraries
sudo apt-get install  libavutil-dev libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libx264-dev

Jetson users have to build FFmpeg libraries from sources. See this shell script for details.

  • Libjpeg and zlib libraries
sudo apt-get install libjpeg-dev zlib1g-dev

Build instructions

  • Obtain source code:
git clone https://github.com/fastvideo/gpu-camera-sample.git 

For Windows users

You also can download precompiled libs from here

  • By default the application will be built with no camera support. The only option is camera simulator which is working with PGM files.
  • Open <ProjectRoot>/src/GPUCameraSample.pro in Qt Creator.
  • Open common_defs.pri
  • To enable GenICam support, uncomment DEFINES += SUPPORT_GENICAM
  • To enable XIMEA camera support, uncomment DEFINES += SUPPORT_XIMEA
  • To enable FLIR camera support, uncomment DEFINES += SUPPORT_FLIR
  • To enable Imperx camera support, uncomment DEFINES += SUPPORT_IMPERX. Since Imperx SDK uses GenApi version 3.0.2, please open common.pri, uncomment GENAPIVER = VC140_v3_0 and comment GENAPIVER = VC141_v3_2
  • Build the project
  • Binaries will be placed into <ProjectRoot>/GPUCameraSample_x64 folder.

For Linux users

Here and after we assume you put source code into home directory, so project root is ~/gpu-camera-sample

  • Make sure file <ProjectRoot>/Scripts/make_links.sh is executable
chmod 755 ~/gpu-camera-sample/Scripts/make_links.sh
  • Create OtherLibsLinux folder in the project root folder. This folder will contain external libraries, used in gpu-camera-sample application.
  • Download Fastvideo SDK x64 platform from Fastvideo SDK (demo) for Linux Ubuntu 18.04, 64-bit, or Fastvideo SDK Arm64 platform from Fastvideo SDK (demo) for NVIDIA Jetson Nano, TX2, Xavier and unpack it into <ProjectRoot>/OtherLibsLinux/FastvideoSDK folder. Copy all files from <ProjectRoot>/OtherLibsLinux/FastvideoSDK/fastvideo_sdk/lib to <ProjectRoot>/OtherLibsLinux/FastvideoSDK/fastvideo_sdk/lib/Linux64 for x64 platform and to <ProjectRoot>/OtherLibsLinux/FastvideoSDK/fastvideo_sdk/lib/Arm64 for Arm64 platform.
  • Create links to Fastvideo SDK *.so files
cd ~/gpu-camera-sample/Scripts
./make_links.sh ~/gpu-camera-sample/OtherLibsLinux/FastvideoSDK/fastvideo_sdk/lib/Linux64
  • If you need direct XIMEA camera support, download XiAPI from https://www.ximea.com/support/wiki/apis/XIMEA_Linux_Software_Package. Unpack and install downloaded package.
  • If you need GenICam support
    • Download GenICamTM Package Version 2019.11 (https://www.emva.org/wp-content/uploads/GenICam_Package_2019.11.zip).
    • Unpack it to a temporary folder and cd to Reference Implementation folder.
    • Create <ProjectRoot>/OtherLibsLinux/GenICam folder.
    • Unpack GenICam_V3_2_0-Linux64_x64_gcc48-Runtime.tgz or GenICam_V3_2_0-Linux64_ARM_gcc49-Runtime.tgz into <ProjectRoot>/OtherLibsLinux/GenICam folder.
    • Unpack GenICam_V3_2_0-Linux64_x64_gcc48-SDK.tgz or GenICam_V3_2_0-Linux64_ARM_gcc49-SDK.tgz into <ProjectRoot>/OtherLibsLinux/GenICam/library/CPP
    • Ensure Qt uses gcc, not clang to build project.
  • By default the application will be built with no camera support. The only option is camera simulator which is working with PGM files.
  • Open <ProjectRoot>/src/GPUCameraSample.pro in Qt Creator.
  • Open common_defs.pri
  • To enable GenICam support, uncomment DEFINES += SUPPORT_GENICAM.
  • To enable XIMEA camera support, uncomment DEFINES += SUPPORT_XIMEA
  • To enable Imperx camera support, uncomment DEFINES += SUPPORT_IMPERX. Since Imperx SDK uses GenApi version 3.0.2, please open common.pri, uncomment GENAPIVER = VC140_v3_0 and comment GENAPIVER = VC141_v3_2
  • FLIR support is experimental at the moment. Use it on your own risk.
  • Build the project.
  • If GenICam support is enabled, set environment variable GENICAM_GENTL64_PATH with full path to the camera vendor GenTL producer (.cti) library, before run the application.
  • Binaries will be placed into <ProjectRoot>/GPUCameraSample_Arm64 or GPUCameraSample_Linux64 folder. To run the application from the terminal run GPUCameraSample.sh. Necessary symbolic links will be made during compile time.

You also can download precompiled libs from here

How to work with NVIDIA Jetson to get maximum performance

NVIDIA Jetson provides many features related to power management, thermal management, and electrical management. These features deliver the best user experience possible given the constraints of a particular platform. The target user experience ensures the perception that the device provides:

  • Uniformly high performance
  • Excellent battery life
  • Perfect stability

Utility nvpmodel has to been used to change the power mode. Mode with power consumption is MAXN. To activate this mode call

sudo /usr/sbin/nvpmodel –m 0

Also you have to call jetson_clocks script to maximize Jetson performance by setting the static maximum frequencies of the CPU, GPU, and EMC clocks. You can also use the script to show current clock settings, store current clock settings into a file, and restore clock settings from a file.

sudo /usr/bin/jetson_clocks

NVIDIA Jetson TX2 has two CPU core types. These are Denver2 and A57. During benchmarking of Fastvideo SDK we have realized that better performance for J2K encoder and decoder could be achieved with A57 core type. Affinity mask has to be assigned to run only on A57 cores. Linux command taskset assign process affinity.

taskset -c 3,4,5 myprogram

TX2 has the following core numbers: 0 – A57; 1, 2 – Denver2; 3, 4, 5 – A57. Core 0 is used by Linux for interrupt processing. We do not recommend include it in the user affinity mask.

Glass-to-Glass Time Measurements

To check system latency we've implemented the software to run G2G tests in the gpu-camera-sample application.

We have the following choices for G2G tests:

  • Camera captures frame with current time from high resolution timer at the monitor, we send data from camera to the software, do image processing on GPU and then show processed image at the same monitor close to the window with the timer. If we stop the software, we see two different times and their difference is system latency.
  • We have implemented more complicated solution: after image processing on GPU we've done JPEG encoding (MJPEG on CPU or on GPU), then send MJPEG stream to receiver process, where we do MJPEG parsing and decoding, then frame output to the monitor. Both processes (sender and receiver) are running at the same PC.
  • The same solution as in the previous approach, but with H.264 encoding/decoding (CPU or GPU), both processes are at the same PC.

We can also measure the latency for the case when we stream compressed data from one PC to another over network. Latency depends on camera frame rate, monitor fps, NVIDIA GPU performance, network bandwidth, complexity of image processing pipeline, etc.

Software architecture

gpu-camera-sample is a multithreaded application. It consists of the following threads:

  • Main application thread to control app GUI and other threads.
  • Image acquisition from a camera thread which controls camera data acquisition and CUDA-based image processing thread.
  • CUDA-based image processing thread. Controls RAW data processing, async data writing thread, and OpenGL renderer thread.
  • OpenGL rendering thread. Renders processed data into OpenGL surface.
  • Async data writing thread. Writes processed JPEG/MJPEG data to SSD or streams processed video.

Here we've implemented the simplest approach for camera application. Camera driver is writing raw data to memory ring buffer, then we copy data from that ring buffer to GPU for computations. Full image processing pipeline is done on GPU, so we need just to collect processed frames at the output.

In general case, Fastvideo SDK can import/export data from/to SSD / CPU memory / GPU memory. This is done to ensure compatibility with third-party libraries on CPU and GPU. You can get more info at Fastvideo SDK Manual.

Using gpu-camera-sample

  • Run GPUCameraSample.exe
  • Press Open button on the toolbar. This will open the first camera in the system or ask to open PGM file (bayer or grayscale) if application was built with no camera support.
  • Press Play button. This will start data acquisition from the camera and display it on the screen.
  • Adjust zoom with Zoom slider or toggle Fit check box if requires.
  • Select appropriate output format in the Recording pane (please check that output folder exists in the file system, otherwise nothing will be recorded) and press Record button to start recording to disk.
  • Press Record button again to stop the recording.

Minimum Hardware ans Software Requirements for desktop application

  • Windows-7/10, Ubuntu 18.04 64-bit
  • The latest NVIDIA driver
  • NVIDIA GPU with Kepler architecture, 6xx series minimum
  • NVIDIA GPU with 4-8-12 GB memory or better
  • Intel Core i5 or better
  • NVIDIA CUDA-10.2
  • Compiler MSVC 2019 for Windows or gcc 7.4.0 for Linux

We also recommend to check PCI-Express bandwidth for Host-to-Device and Device-to-Host transfers. For GPU with Gen3 x16 it should be in the range of 10-12 GB/s. GPU memory size could be a bottleneck for image processing from high resolution cameras, so please check GPU memory usage in the software.

If you are working with images which reside on HDD, please place them on SSD or M2.

For testing purposes you can utilize the latest NVIDIA GeForce RTX 2060, 2070, 2080ti or Jetson Nano, TX2, NX and AGX Xavier.

For continuous high performance applications we recommend professional NVIDIA Quadro and Tesla GPUs.

Multi-camera applications

To run the software for multi-camera setups, we recommend to run one process per camera. If you have enough GPU memory and processing performance is ok, this is the simplest solution, which was tested in many applications. This is also a good choice for Linux solutions, please don't forget to turn on CUDA MPS.

You can also create a software module to collect frames from different cameras and process them at the same pipeline with gpu-camera-sample application. In that case you will need less GPU memory which could be important for embedded solutions.

Roadmap

  • GPU pipeline for monochrome cameras - done
  • GenICam Standard support - done
  • Linux version - done
  • Software for NVIDIA Jetson hardware and L4T for CUDA-10.2 (Jetson Nano, TX2, Xavier AGX and NX) - done
  • Glass-to-Glass (G2G) test for latency measurements - done
  • Support for XIMEA, MATRIX VISION, Basler, FLIR, Imperx, JAI, Daheng Imaging cameras - done
  • MJPEG and H.264 streaming with or without FFmpeg RTSP - done
  • HEVC (H.265) encoder/decoder - done
  • Real-time Image Processing on NVIDIA GPU with Basler pylon - done
  • Benchmarks for Jetson Xavier NX - done
  • CUDA-11.3 support - in progress
  • Support for Emergent Vision Technologies, DALSA, IDS Imaging, Baumer, Kaya Instruments, SVS-Vistek cameras - in progress
  • Transforms to Rec.601 (SD), Rec.709 (HD), Rec.2020 (4K)
  • RAW Bayer codec
  • JPEG2000 encoder and decoder on GPU for camera applications
  • Interoperability with FFmpeg, UltraGrid, and GStreamer

Info

Fastvideo SDK Benchmarks

Downloads