CAVA is a library for building and simulating camera vision pipelines. It is written to work with the gem5-aladdin SoC simulator.
CAVA consists of two parts:
- An Image Signal Processor (ISP) (a configurable five-stage pipeline)
- A DNN framework (SMAUG).
In SMAUG, several reference implementations are provided, along with a model of an actual SoC containing multiple DNN accelerators.
We will install CAVA's dependencies, then build and run it on a simple example.
We have tested it only on Linux, but theoretically it should work wherever you can run gem5-aladdin. Let us know if you encounter any issues building or executing on other systems.
CAVA depends on libconfuse
for
reading its configuration files. For example, you can install it on
Ubuntu with:
apt-get install libconfuse-dev
The scripts/load_and_convert.py
script converts between raw images and binary
arrays. If you want to use this script, you will need to install the
imageio library.
pip install imageio
git clone git@github.com:yaoyuannnn/cava.git
Set the environment variable $CAVA_HOME
to the cloned directory. (This is
used in cava/cam_vision_pipe/src/common/main.c
).
(Note that we have included the SSH URL for the repository, not the HTTPS one.)
In the same directory that you cloned CAVA, clone the gem5-aladdin repository.
# recursively clone aladdin and xenon dependencies
git clone --recursive git@github.com:harvard-acc/gem5-aladdin.git
After the aladdin repository has been recursively cloned into the
gem5-aladdin/src
subdirectory, set your $ALADDIN_HOME
environment variable
to the path gem5-aladdin/src/aladdin
within gem5-aladdin. This environment
variable determines the paths in the build files, so you will see some errors
when building if you forget to set it.
First, make sure you are using gcc
by setting your $CC
environment variable
to gcc
, which is used in the build files. (If you use Clang, you will likely
see a bunch of unrecognized warning flags and run into issues with unrecognized
.func
and .endfunc
directives which are used in -gstabs
debugging.)
To build and run the default camera vision pipeline:
make native
cd sim
sh run_native.sh
An Image Signal Processor (ISP) converts the raw pixels produced by camera sensors to useful images.
The default ISP kernel is modeled after the Nikon-D7000 camera. It contains a five-stage camera pipeline:
- Demosaicing: Interpolate undersampled sensors to produce a mosaic of RGB pixel intensities
- Denoising: Reduce noise in image
- Color Space Conversion / White Balancing: Preserve neutrality of neutral colors
- Gamut Mapping: Map to restricted available colors of output device without compromising the original image
- Tone Mapping: Map to restricted dynamic range of output device without compromising the original image
The purpose and implementation of every pipeline stage is discussed in more
detail as follows. See cam_vision_pipe/src/cam_pipe/kernels/pipe_stages.c
for
the corresponding implementation details.
Filters using a color filter array (CFA) over each photosite of a sensor to interpolate local undersampled colors into a true color at the pixel. A common CFA is known as the Bayer filter, which contains more green than red and blue pixels due to the imbalance in human perception. The filter operation yields a "mosaic" of RGB pixels with intensities.
Also known as debayering, CFA interpolation, or color reconstruction.
There are many algorithms for denoising, which aims to reduce the level of noise in the image. The default ISP kernel implements a local nonlinear interpolation.
To perform color balancing, we multiply the RGB color value at each point with a 3x3 diagonal matrix whose values are configurable.
A gamut is the set of colors which fully represents some scenario, whether an image, a color space, or the capability of a particular output device.
For example, preparing an image for printing requires gamut mapping. This is because the image is often specified in RGB, whereas the printer expects the CMYK color space. Gamut mapping performs this transformation from RGB to CMYK so that the image is most faithfully realized in print.
The gamut mapping stage here computes the L2-norm to a set of control points, weights them, and adds bias for a radial basis function (RBF).
Tone mapping approximates images with a higher dynamic range than the output device. The process must preserve colors and other aspects of the original image while seeking to squeeze the presumably stronger contrast of the original image into the feasible range of the output device.
For example, an HDR image may be the result of capturing multiple exposures which together approximate the luminance of the original scene. The tone mapping operator then squeezes this into the lower dynamic range of an output device such as a monitor. Although such approximations may produce unusual artifacts, they preserve image features and often retain a pleasant balance between global contrast and local contrast.
There are a variety of available tone mapping operators (TMOs), which may be either global or local.
This is sometimes called color reproduction or color processing.
SMAUG is a framework for deep neural networks (DNNs), together with reference implementations that include a model of an SoC with multiple DNN accelerators.
The input for CAVA is a raw image.