firestorm
Evaluation of fast (and eventually distributed) vector processing for dot products. A bench of 2048 element 32-bit floating point vectors is processed against a single query vector.
Different operation types are implemented for evaluation:
- plain, naive for-loop
- 8-fold unrolled for-loop
- hand-tuned AVX optimized for-loop
- hand-tuned SSE4.2 optimized for-loop
- OpenMP SIMD optimized for-loop
CMake Options
These options configure the CMake build. To enable an option OPTION
,
set -DOPTION=ON
; to disable, use -DOPTION=OFF
.
To build in release mode with link-time optimizations enabled, call e.g.
cmake -DCMAKE_BUILD_TYPE=Release \
-DFSTM_ENABLE_LTO=ON \
..
Build and linker options
Option | Default | Description |
---|---|---|
FSTM_ENABLE_LTO | ON | Enables link-time optimization (if available). |
FSTM_ENABLE_CCACHE | ON | Enables ccache support when building (if available). |
Google Performance Tools
Option | Default | Description |
---|---|---|
FSTM_WITH_PROFILER | OFF | Builds with performance profiler support. |
FSTM_WITH_TCMALLOC | ON | Builds with tcmalloc support. |
Functionality and performance options
Option | Default | Description |
---|---|---|
FSTM_WITH_FAST_MATH | OFF | Enables fast math optimizations (if available). |
FSTM_WITH_OPENMP | ON | Enables OpenMP support (if available). |
FSTM_WITH_SIMD_AVX2 | OFF | Builds with AVX2 support |
FSTM_WITH_SIMD_AVX | OFF | Builds with AVX support |
FSTM_WITH_SIMD_SSE42 | ON | Builds with SSE 4.2 support |
Development options
Option | Default | Description |
---|---|---|
FSTM_BUILD_TESTS | ON | Builds unit tests. |
Unit Tests
Unit tests are provided by means of googletest.
To enable CTest testing functionality and building of the unit tests project,
configure CMake with the -DFSTM_BUILD_TESTS=1
option,
e.g. using
mkdir build && cd build
cmake -DFSTM_BUILD_TEST=1 ..
make test
CPU Profiling
Google Performance Tools
Support for Gperftools is available in the project. In order to profile using its CPU Profiler, enable support in CMake, then run the program with the following environment variables set:
CPUPROFILE=/tmp/firestorm.prof
CPUPROFILE_FREQUENCY=1000
This will create the file specified in the CPUPROFILE
containing
the sampling information. To display the profiling data, call
pprof --web firestorm firestorm.prof
Or, if kcachegrind
is available:
pprof --callgrind firestorm firestorm.prof > firestorm.callgrind
kcachegrind firestorm.callgrind
If the file is empty, the application likely didn't exit normally. More interesting information can be found here.
Valgrind
In order to profile the call graph with Valgrind and KCachegrind, run the application using
valgrind --tool=callgrind ./firestorm
Which will create an output file, e.g. callgrind.out.18360
.
You can then visualize the results using that file with
kcachegrind callgrind.out.18360
Cache and branch prediction profiling
Read here for further information.