firestorm

Evaluation of fast (and eventually distributed) vector processing for dot products. A bench of 2048 element 32-bit floating point vectors is processed against a single query vector.

Different operation types are implemented for evaluation:

  • plain, naive for-loop
  • 8-fold unrolled for-loop
  • hand-tuned AVX optimized for-loop
  • hand-tuned SSE4.2 optimized for-loop
  • OpenMP SIMD optimized for-loop

CMake Options

These options configure the CMake build. To enable an option OPTION, set -DOPTION=ON; to disable, use -DOPTION=OFF. To build in release mode with link-time optimizations enabled, call e.g.

cmake -DCMAKE_BUILD_TYPE=Release \
      -DFSTM_ENABLE_LTO=ON \
      ..

Build and linker options

Option Default Description
FSTM_ENABLE_LTO ON Enables link-time optimization (if available).
FSTM_ENABLE_CCACHE ON Enables ccache support when building (if available).

Google Performance Tools

Option Default Description
FSTM_WITH_PROFILER OFF Builds with performance profiler support.
FSTM_WITH_TCMALLOC ON Builds with tcmalloc support.

Functionality and performance options

Option Default Description
FSTM_WITH_FAST_MATH OFF Enables fast math optimizations (if available).
FSTM_WITH_OPENMP ON Enables OpenMP support (if available).
FSTM_WITH_SIMD_AVX2 OFF Builds with AVX2 support
FSTM_WITH_SIMD_AVX OFF Builds with AVX support
FSTM_WITH_SIMD_SSE42 ON Builds with SSE 4.2 support

Development options

Option Default Description
FSTM_BUILD_TESTS ON Builds unit tests.

Unit Tests

Unit tests are provided by means of googletest. To enable CTest testing functionality and building of the unit tests project, configure CMake with the -DFSTM_BUILD_TESTS=1 option, e.g. using

mkdir build && cd build
cmake -DFSTM_BUILD_TEST=1 ..
make test

CPU Profiling

Google Performance Tools

Support for Gperftools is available in the project. In order to profile using its CPU Profiler, enable support in CMake, then run the program with the following environment variables set:

CPUPROFILE=/tmp/firestorm.prof
CPUPROFILE_FREQUENCY=1000

This will create the file specified in the CPUPROFILE containing the sampling information. To display the profiling data, call

pprof --web firestorm firestorm.prof

Or, if kcachegrind is available:

pprof --callgrind firestorm firestorm.prof > firestorm.callgrind
kcachegrind firestorm.callgrind

If the file is empty, the application likely didn't exit normally. More interesting information can be found here.

Valgrind

In order to profile the call graph with Valgrind and KCachegrind, run the application using

valgrind --tool=callgrind ./firestorm

Which will create an output file, e.g. callgrind.out.18360. You can then visualize the results using that file with

kcachegrind callgrind.out.18360

Cache and branch prediction profiling

Read here for further information.

Reading material