Recurrent Neural Network from a node perspective
Important: This project is written in python 3.5!
neither 2.7 nor 3.4 will work
The backbone is the nodes module. Here you will find the most used node types defined:
- InputNode -> outputs the matrix that was passed at node creation
- ParameterNode -> same as InputNode, but also copies dCost/dParameter to matrix passed at node creation.
- GroundTruthNode -> same as InputNode, but simple another name to make debugging easier
- AdditionNode -> output sum its input nodes
- DotNode -> output dotproduct of first input node and second input node
- LogisticTransformNode -> output the logistic transform of its input node
- EntropyCostNode -> outputs the Entropy error of groundTruth Input and estimated input
- ConcatenateNode -> outputs all input nodes concatednated along a specified axis
all input matrices are assumed to be numpy matrices. This is necessary for some functionality to work properly
x1 = someinputmatrixattime1
x2 = someinputmatrixattime2
y = outputattime1and2
h0 = InputNodes(zeros)
theta_input = ParameterNode(weights_input, gradientmatrix_input)
theta_prev = ParameterNode(weights_prev, gradientmatrix_prev)
theta_out = ParameterNode(weights_out, gradientmatrix_out)
h1 = LogisticTranformNode(AdditionNode(DotNode(InputNode(x1), theta_input), DotNode(h0, theta_prev)))
y_hat1 = LogisticTranformNode(DotNode(h1, theta_out))
h2 = LogisticTranformNode(AdditionNode(DotNode(InputNode(x2), theta_input), DotNode(h1, theta_prev)))
y_hat2 = LogisticTranformNode(DotNode(h2, theta_out))
y_hat_vector = ConcatenateNode(y_hat1, y_hat2)
cost = EntropyCostNode(y, y_hat_vector)
cost can be computed by forward propagating and fetching output of cost:
cost.forward_prop()
C = cost.output
cost gradients can be obtained by backpropagating (requires forward propagation has been dones first). The resulting gradients will be stored in the gradient matrices supplied at declaration of parameter nodes
cost.start_backprop()
dCost_dthetainput = gradientmatrix_input
Use these gradients to update the weights be directly changing the weight matrices passed at declaration of parameter nodes.
theta_input += gradient2updatestep(dCost_dthetainput)
recompute and get new y estimate by forward propagating from concatenation node
y_hat_vector.forward_prop
new_estimate = y_hat_vector.output
To make coding easier to understand some infix operators are baked in at object creation:
- @:
node1 @ node2 -> DotNode(node1, node2)
- +:
node1 + node2 -> Addition(node1, node2)
if node1 is a concatenation node ->ConcatenateNode(node2, *node1.input_nodes)
- +=:
node1 += node2 -> node1.append(node2)
if node1 is a concatenation node the effect is the same
Using these we can refactor the simple network
x1 = someinputmatrixattime1
x2 = someinputmatrixattime2
y = GroundTruthNode(outputattime1and2)
y_hat = ConcatenateNode()
h0 = InputNode(zeros)
theta_input = ParameterNode(weights_input, gradientmatrix_input)
theta_prev = ParameterNode(weights_prev, gradientmatrix_prev)
theta_out = ParameterNode(weights_out, gradientmatrix_out)
h1 = LogisticTranformNode((InputNode(x1) @ theta_input) + (h0 @ theta_prev))
y_hat += LogisticTranformNode(h1 @ theta_out)
h2 = LogisticTranformNode((InputNode(x2) @ theta_input) + (h1 @ theta_prev))
y_hat += LogisticTranformNode(h2 @ theta_out)
cost = EntropyCostNode(y, y_hat)
To make debugging easier it is possible to provide at name key-word when creating a node. This name should be a tuple with (name, level, timestep). In utils you will find the class NodeName which is a namedtuple that makes naming a bit easier. Naming the node makes the repr function use this name instead of the usual representation.
print(LogisticTranformNode((InputNode(x1) @ theta_input) + (h0 @ theta_prev)).__repr__())
>>> LogisticTranformNode(AdditionNode(DotNode(InputNode(), ParameterNode()), DotNode(InputNode(), ParameterNode())))
print(LogisticTranformNode((InputNode(x1, name=('x', 1)) @ theta_input) + (h0 @ theta_prev)).__repr__())
>>> LogisticTranformNode(AdditionNode(DotNode(x[1], ParameterNode()), DotNode(InputNode(), ParameterNode())))
print(LogisticTranformNode((InputNode(x1) @ theta_input) + (h0 @ theta_prev), name=NodeName('h', t=1 , l=1).__repr__())
>>> h[1,1]
To make a new node you should inherit from one of the node ABC's
- ParentNode: for nodes that has one or more inputnodes
- Topnode: for nodes that does not have any parent itself. mostly for new cost nodes
- Node: the most basic one. use only if above ABC's are not applicable
Additionally there is a mutable input mixin that makes it possible to add or remove input nodes after object creation. This mixin is used for example in AdditionNode and ConcatenateNode
Any node just has implement the abstract methods compute_cost_gradient_wrt_input_node
and compute_output
These are essential to doing the forward and backward propagation.
To check your implementation any topnode has the method check_grad(self, h=0.001, tol=.1)
which will throw an error if any grandchild reports a gradient that differs too much from the central approximated gradient. NB: This check will only be done for ancestors to parameter nodes..
recipes are ways of putting together nodes to form fully fledged netowrks and can be found in the recipe
module. At the time of writing only a very simple SRNN recipe is there and no ABC for creating new a new recipe has been made.