/RNN

Easy to use wrapper to train and test RNNs using cuDNN

Primary LanguageCuda

cudaRNN

Minimal wrapper that uses cuDNN to implement efficient RNNs in GPU. Very easy to use.

dibujito

Benchmarks

In what follows, some benchmarks are shown comparing this implementation with respect to TensorFlow using a GTX 1070 Ti GPU.

Comparison of memory used by TensorFlow with respect to this implementation, as a function of hiddenSize.

mem vs hiddenSize

Speedup obtained for this implementation with respect to TensorFlow as a function of the sequence length seqLength, for both LSTM and GRU cells:

speedup vs seqLength

Speedup obtained with respect to TensorFlow as a function of the number of hidden units hiddenSize:

speedup vs hiddenSize

Time per iteration in ms as a function of hiddenSize for LSTM cells. Use static persistent kernels while possible:

time vs hiddenSize

How to use

The library is contained within the cudaRNN namespace. The workflow is very straightforward and similar to TensorFlow.

  • Initialize the structure cudaRNN::RNNOptions_t
  • Instantiate cudaRNN::RNN using the previous structure. This class is templatized with 2 arguments: the first one refers to the data type of the inputs and targets (int, float, or double), and the second one to the data type of the weights (__half, float or double).
  • Initialize inputs, which should be ordered as [inLength, nSequences, inVecSize], and targets as [outLength, nSequences, inVecSize], by using the methods setInputs and setTargets.
  • Select an optimizer and a loss metric through setOptimzer and setMetrics (optional).
  • Call train.

Public variables of the structure RNNOptions_t

table1

This structure contains the following enumerations:

table2