/rnn_vanilla

A nice vanilla RNN without Keras

Primary LanguagePython

Vanilla RNN in Python with NO libraries

This is a Python implementation of a vanilla RNN. It uses a simple toy dataset to demonstrate how to train an RNN using backpropagation through time (BPTT). The code is intended to provide a basic understanding of how RNNs work and how to implement them in Python without using any deep learning frameworks like TensorFlow or PyTorch.

Requirements

  • Python 3.x
  • NumPy

Usage

To train the RNN, run the following command:

python main.py

This will train the RNN on a toy dataset and print the loss every 10 epochs. After training, the RNN will generate a new sequence of outputs based on the input sequence.

Code structure

The code consists of the following files:

  • main.py: The main Python script that trains and tests the RNN.

This script is the main entry point for the RNN implementation. It prepares the dataset, initializes the RNN model, trains the model, and generates a new sequence of outputs.

The script consists of the following sections:

  • Data preparation: Generates a toy dataset consisting of an input sequence and an output sequence.
  • RNN model initialization: Initializes the weight matrices and bias vectors for the RNN model.
  • Training loop: Trains the RNN model using backpropagation through time (BPTT).
  • Output generation: Generates a new sequence of outputs based on the input sequence.
  • rnn section: This module defines the RNN model. It consists of the following functions:
  • rnn_forward: Implements the forward pass of the RNN model.
  • mse_loss: Implements the mean squared error loss function.
  • rnn_backward: Implements the backward pass of the RNN model to compute gradients. (not yet implemented)

FUNCTIONS

**tanh: Implements the hyperbolic tangent activation function. softmax: Implements the softmax activation function. grad_tanh: Computes the gradient of the hyperbolic tangent activation function. grad_softmax: Computes the gradient of the softmax activation function. one_hot: Converts a vector of integers to a one-hot encoding.

RNN model

The RNN model consists of a single hidden layer with a hyperbolic tangent activation function. The output of the RNN model is passed through a linear layer with no activation function.

The RNN model is trained using backpropagation through time (BPTT). The gradients are computed using the mean squared error loss function.

Dataset

The toy dataset consists of a sequence of 3-dimensional vectors. The input sequence has 3 time steps, and the output sequence has 1 time step.

The input sequence and output sequence are generated by adding a constant value to the first element of each vector in the input sequence.

Training

The RNN model is trained using stochastic gradient descent (SGD). The learning rate and number of epochs can be adjusted in the main.py script.

Output generation

After training, the RNN model generates a new sequence of outputs based on the input sequence. The generated sequence is printed to the console.

Conclusion

This Python implementation of a vanilla RNN provides a basic understanding of how RNNs work and how to implement them in Python without using any deep learning frameworks. The code can be easily modified to work with other datasets and network architectures