This project is me playing around with neural network ideas from 3Blue1Brown's YouTube series. It really doesn't do much yet.
cargo run --bin output-mnist-images
Explorations in text embedding.
Activation - The number that each node in the neural network holds. Weight - The value connecting each node to the next node in the next layer. Hidden layer - The layers between the input and the output.
Activations should be ranged 0-1. A common function to do this is called the sigmoid function.
Logistics curve: σ(x) = 1 / ( 1 + e^-x )
This is applied by:
Activation = σ(w1a1 + w2a2 + w3a3 + ... + wnan)
Then there is a bias added in. The weight tells you the general weight, and the bias applies how much this value should activate the neuron.
Activation = σ(w1a1 + w2a2 + w3a3 + ... + wnan - bias) Feed forward: a¹ = σ(Wa⁰ + b) a² = σ(Wa¹ + b)
Learning: Finding the right weights and biases to solve your problem.
In order to train the neural network you need a cost function.
- https://towardsdatascience.com/step-by-step-tutorial-on-linear-regression-with-stochastic-gradient-descent-1d35b088a843
- http://neuralnetworksanddeeplearning.com/
{\partial C \over \partial a^L_j} \cdot {\partial a^L_j \over \partial z^L_{jk}} \cdot {\partial z^L_j \over \partial w^L_{jk}} $$
{\partial C \over \partial a^L_j} \cdot {\partial a^L_j \over \partial z^L_{jk}} $$
The cost function is defined as:
Formula | Description |
---|---|
The weights vector for a layer. | |
The desired output vector for a layer. | |
The bias vector for a layer. | |
The current layer | |
The previous layer | |
The learning rate, eta, applied to |
|
$ \sigma(x) = {\Large 1 \over {1 + e^{-x}}}$ | The logistic sigmoid function keeps the activations between -1 and 1. |
The interior of the activation function, without the sigmoid applied. | |
The activation function requires a sigmoid, which keeps the values between -1 and 1. | |
The full activation function | |
Cost | |
The partial derivative of the cost with respect to the weight, determined using the chain rule. | |
Derivative of |
|
Derivative of |
|
Derivative of |
|
The formula for the derivative | |
Derivative of the full cost function is the average of all training examples. |
The weight of the next training round
The weight of the next training round