/engine-cuda

engine-cuda is a CUDA/OpenCL engine for the popular OpenSSL cryptography framework.

Primary LanguageCGNU General Public License v3.0GPL-3.0

engine-cuda / engine-opencl

engine-cuda is an engine for the popular OpenSSL cryptography framework. It enables the execution of different block ciphers on a GPU using CUDA and OpenCL. Currently, the following block ciphers are supported:

  • AES-128, AES-192, AES-256 (ECB, CBC decryption)

  • Blowfish (ECB, CBC decryption)

  • CAST5 (ECB, CBC decryption)

  • Camellia-128 (ECB, CBC decryption)

  • DES (ECB, CBC decryption)

  • IDEA (ECB, CBC decryption)

Building engine-cuda

In order to build and use engine-cuda you will need

  • CUDA toolkit installed

  • patched OpenSSL (see openssl-patch directory)

  • gcc etc.

In order to generate a Makefile you have to call configure first.

./configure

You might have to supply configure with the path to your CUDA toolkit:

./configure --with-cudatoolkitpath=/opt/cuda

Some helpful autoconf flags:

--with-gpuarch=sm_xx (to make nvcc compile for a certain compute capability)
--with-buffersize=<bytes> (to set the size of the staging buffer between CPU and GPU)
--with-maxrregcount=<regs> (to set the maximum registers per kernel thread)
--disable-libopencl (to disable building engine-opencl)

After that simply do

make
make install

For any further information please refer to http://code.google.com/p/engine-cuda/

Development

Further helpful autoconf switches for debugging and developing engine-cuda:

--enable-debug (enables some verbose output)
--enable-timing (necessary to get kernel execution timing output)
--enable-verbose (enables verbose compilation output including register use)
--disable-aes-coarse (use fine-grained implementation of AES)

Benchmarks of engine-cuda, engine-opencl and the individual kernels can be performed using the files in the test/ subdirectory. For testing the whole engine, call

./test-speed.sh <avg_runs> <GPUONLY/CPUONLY> <ciphers/"all"> <engine>

in test/plots subdir. Plots using gnuplot will automatically generated. Tests with 5 iterations and for all block ciphers will take some time.

To test the raw kernel execution time, have a look at the

./generate-files.sh <device> (/dev/zero or /dev/urandom)
./test-kernels.sh <key> <ciphers>

scripts in test/. This will generate files using the device and run them through the kernels.

During development, use

./test-correctness.sh <cipher/"all"> <file> <key> <bufsize>

to compare the output of engine-cuda / engine_opencl with the CPU output in order to ensure functionality of the algorithm.

Caveats

There are many (very strange) problems which can arise when using and developing CUDA and OpenCL. We have tried to identify these within the documentation/source, but you may stumble on other problems as well.

For CUDA, using the compute-exclusive mode removes the initial lag:

nvidia-smi -pm 1
nvidia-smi -c 1

to set persistence-mode and compute-exclusive mode

Credits

This is a full credits-file of people that have contributed to this project.

Donations

Since a few people asked me, you can donate here: http://pledgie.com/campaigns/18855