- We implemented 3x3 convolutions.
- When using the Winograd algorithm for convolutions, we used F(2x2, 3x3), which means that the output for one tile is 2x2.
- install latest XCode Command Line Tools
- install brew
- install armadillo
brew install armadillo
- install llvm for openmp
brew install llvm
- Download the source for armadillo
- After unzipping the directory,
cmake .
, thenmake
, thenmake install DESTDIR=~/lib
- Compile with
make
- Create a problem file by running
python3 gen_problem.py > [problem filename]
- Use a file of the generated format (see above) as input for the program
./naive_convolution [input filename] [output filename]
./winograd [input filename] [output filename]
./winograd_openmp [input filename] [output filename]
./winograd_gpu [input filename] [output filename]
- All floating point additions and multiplications are counted as separate operations.
- Manually counted, as Haswell architecture does not support hardware counters
./bench_image_size.sh
./extract_perf.py BENCHMARK_FILE
will extract MFlop/s and times tobench_extract_mflops.csv
andbench_extract_times.csv
.- NOTE: The benchmark creates many
.in
input files that may take up a lot of disk space. Make sure disk quota does not fill up during benchmarking.