An implementation to create and train a simple neural network in python - just to learn the basics of how neural networks work. Note: if you're looking for an implementation which uses automatic differentiation, take a look at scalarflow
Run the full example:
# Create virtual environment (Tested with Python 3.10)
python -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install requirements
pip install -r requirements.txt
# Run the example
python mnist.py
python boston.py
Create a tuple of layers where each element is a tuple as well
The first element of this tuple needs to be the actual layer, and the second element needs to be the activation function applied to the layer
from nn.layer import Dense
from nn.activation import ReLU, Sigmoid
layers = (
(Dense(64), ReLU()),
(Dense(64), ReLU()),
(Dense(1), Sigmoid())
)
The model can then be created using the NeuralNetwork class
from nn.loss import BinaryCrossEntropy
from nn.model import NeuralNetwork
model = NeuralNetwork(
loss=BinaryCrossEntropy(),
optimizer=Adam(learning_rate=0.01),
regularization_factor=2.0,
)
The model can then be trained:
model.fit(x_train, y_train, epochs=20, verbose=True)
The training loop is in the fit
method in NeuralNetwork
:
class NeuralNetwork(Model):
...
def fit(self, examples, labels, epochs):
self._input = examples
for epoch in range(1, epochs + 1):
_ = self(self._input) # [1]
loss = self._loss(self._output, labels) # [2]
self.backward_step(labels) # [3]
self.update() # [4]
...
At the moment, one iteration is on the entire training set and mini-batch is not implemented.
In each iteration, we take a forward pass through the model self(self._input)
.
Then loss is computed. Loss computation is only necessary if you plan to use the loss in some way - eg. log the loss.
The backward pass self.backward_step(labels)
goes from the output layer, all the way
back to the inputs to compute gradients for all the learnable parameters. Once this is done,
we can update the learnable parameters with the self.update()
method.
Forward pass is executed when the model instance is called:
class NeuralNetwork(Model):
...
def __call__(self, input_tensor):
if self._num_examples is None:
self._num_examples = input_tensor.shape[-1]
output = input_tensor
for layer, activation in self._layers:
output = layer(output)
output = activation(output)
self._output = output
return self._output
...
The tuple of layers in the self._layers
parameter is actually a tuple of tuples where
each tuple has a layer (e.g. Dense), and an activation (e.g. ReLU).
A loss function is required when instantiating the model. The loss function must implement the ILoss
protocol
which returns computed loss when the loss function instance is called.
Backward pass computes gradients for all learnable parameters of the model:
class NeuralNetwork(Model):
...
def backward_step(self, labels: np.ndarray):
da = self._loss.gradient(self._output, labels)
for index in reversed(range(0, self._num_layers)):
layer, activation = self._layers[index]
if index == 0:
prev_layer_output = self._input
else:
prev_layer, prev_activation = self._layers[index - 1]
prev_layer_output = prev_activation(prev_layer.output)
dz = np.multiply(da, activation.gradient(layer.output))
layer.grad_weights = np.dot(dz, np.transpose(prev_layer_output)) / self._num_examples
layer.grad_weights = layer.grad_weights + \
(self._regularization_factor / self._num_examples) * layer.weights
layer.grad_bias = np.mean(dz, axis=1, keepdims=True)
da = np.dot(np.transpose(layer.grad_weights), dz)
self._optimizer.layer_number = index
self._optimizer.update_weights(layer, layer.grad_weights)
self._optimizer.update_bias(layer, layer.grad_bias)
...
After calculating gradients from the loss function, we iterate over the layers
backwards all the way to the input to compute the gradients for all learnable parameters.
The computed gradients for each layer are stored in the layer instance itself - i.e
layer.grad_weights
and layer.grad_bias
.
When the loop reaches the first layer, there is no previous output to it. Therefore, we set
prev_layer_output
to self._input
- i.e. the input to the model, the examples
self._optimizer.layer_number = index
: This line sets the layer_number
attribute of the optimizer to the current index. The layer_number
attribute is used by the optimizer to keep track of the current layer being updated during the backward step.
self._optimizer.update_weights(layer, layer.grad_weights)
: This line calls the update_weights
method of the optimizer and passes the current layer and its corresponding gradient of weights (layer.grad_weights
) as arguments. The optimizer uses this information to update the weights of the layer based on its specific optimization algorithm (e.g., Adam, RMSprop).
self._optimizer.update_bias(layer, layer.grad_bias)
: This line calls the update_bias
method of the optimizer and passes the current layer and its corresponding gradient of biases (layer.grad_bias
) as arguments. The optimizer uses this information to update the biases of the layer based on its specific optimization algorithm.
Finally, the learnable parameters (weights and biases) are updated based on its specific optimization algorithm (e.g., Adam, RMSprop):
class NeuralNetwork(Model):
...
for ln in range(0, len(self._layers)):
self._optimizer.layer_number = ln
self._layers[ln][0].update(self._optimizer)
...
Similarly run the boston.py using
python boston.py
Currently tested with python=3.10
. Tests for the nn
module are available in ./tests
and can be run with python -m unittest ./tests -v
. Linted with black .
.
- Learning rate scheduler callback
- Way to implement non trainable layers like Dropout
- Way to save and load model parameters