NumPy-DNC

Pure NumPy based implementation of Differentiable Neural Computer (DNC), intending to be read for understanding the architecture. In comparison to well developed deep learning frameworks, one of the advantages of reading through NumPy implementation is the explicit details of vectorizing computations to speed up training and inferencing. Operations are broken down according to corresponding subsections in the paper.

Code Structure

The code is currently structured as follows:

accessor.py implements the writing and reading interactions with memory based on the emitted interface vector from the controller, as detailed in the METHODS section of the paper.
dnc_lstm.py implements a DNC model with single layer LSTM controller.
dnc_ff.py implements a DNC model with a feedforward controller.
util.py includes a variety of helper functions.
unit_test.py provides test cases to check against the correctness of each operation and the corresponding gradient.
CopyTask.ipynb intends to provide an example trainer for DNC with visualization of results on the copy task.
RepeatedCopyTask.ipynb provides visualization of results on the repeated copy task.

The directory structure may subject to change. The code is Python 2.7.

Setup

The project relies on the NumPy based autograd library (pip install autograd) for automatic differentiation during backprop. In near future, I want to remove this dependency by building my own automatic differentiation DAG constructor.

Differentiable Neural Computer

DNC belongs to the family of memory-augmented neural networks, which are designed to be able to represent variables and data structures. Results from the paper have shown the potential for DNC to solve complex and structured tasks that are difficult for neural networks without external memory. The predecessor of DNC is the Neural Turing Machine (NTM). The major difference between the two arise from the difference in access mechanism to interface with the memory matrix:

NTM has no way to prevent overlapping or interference among allocated memory locations. The dynamic memory allocation mechanism allows DNC to avoid this problem.
NTM cannot free used locations. DNC incorporates the free gates to address this problem.
NTM can only preserve sequential information in contiguous memory. DNC specifically uses a temporal link matrix to track the order in which the writes were made, effectively decoupling sequential information and memory location.

Tasks

Currently, the copy and repeated copy tasks are implemented:

Copy

In this task, a sequence of bit vectors is provided to the model. The input sequence starts with a reserved start vector, then some random binary bit vectors, followed by a reserved stop vector. The model is expected to start reproducing the exact input bit vectors immediately after observing the stop vector.

Input: The first column of input is the reserved start vector, and the 6th column is the reserved stop vector for input

Target: The last column is the reserved stop vector for output

Output: Using DNC with LSTM controller: DNC(input_size=seq_wid+2, output_size=seq_wid+1, hidden_size=32, R=2, N=64, W=8) with 30k iterations of training

Repeated Copy

In this task, a sequence of bit vectors is provided to the model. The input sequence starts with a reserved start vector, then some random binary bit vectors, followed by a reserved count vector, providing the number of intended repetition in a reserved channel. The model is expected to start reproducing the exact input bit vectors for given number of repetitions immediately after observing the count vector.

Input: The first column of input is the reserved start vector, and the 6th column is the reserved count vector for input, with expected number of repetition placed in the count channel (first row). The color shown in the image below is normalized, where the expected number of repetition is 4.