- Understanding the MNIST dataset structure
- Loading and normalizing the data
- Splitting into training and test sets
- Input layer (784 neurons for 28x28 images)
- Hidden layer(s)
- Output layer (10 neurons for digits 0-9)
graph TD
A[Input Layer] --> B[Hidden Layer 1]
B --> C[Hidden Layer 2]
C --> D[Output Layer]
subgraph "Network Architecture"
A -- Weights --> B
B -- Weights --> C
C -- Weights --> D
end
E[Neuron] --> F[Weights]
E --> G[Bias]
E --> H[Activation Function]
- Sigmoid:
$\f(x) = 1 / (1 + e^(-x))$ - ReLU: f(x) =
$\max(0, x)$ - Tanh: f(x) =
$(e^x - e^(-x)) / (e^x + e^(-x))$ - Softmax for output layer:
$\f(x_i) = e^(x_i) / Σ(e^(x_j))$
- Matrix multiplication and bias addition
- Applying activation functions
- Cross-entropy loss for classification
- Computing gradients using the chain rule
- Updating weights and biases
- Stochastic Gradient Descent (SGD)
- Learning rate and mini-batch size
- Iterating through epochs and mini-batches
- Accuracy calculation on test set
- Techniques for identifying and fixing issues
- Strategies for improving performance