-
Parameter Initialization: You've initialized the model's parameters with random weights drawn from a normal distribution and set the bias to 0. This initialization applies to all the Linear layers in both models.
-
Weight Initialization: If 'n' represents the number of nodes in a Linear Layer, the weights are samples from a normal distribution in the range (0, y). In this case, 'y' represents the standard deviation, calculated as y=1.0/sqrt(n).
-
Choice of Normal Distribution: You've opted for a normal distribution because it has a higher probability of choosing weights close to zero compared to higher values. This is in contrast to a uniform distribution, where the probability of selecting any value is the same.
-
Convolution Layer 1: Your CNN has two convolution layers. The first layer accepts a single channel as input (for grayscale images) and applies a kernel of size 3x3 with stride 1. This layer also features 16 output channels, which means it extracts 16 feature maps. Image padding of size 1 is used to maintain the input and output dimensions, resulting in an output size of 16 x 28 x 28.
-
Activation Layer 1: After the first convolution layer, a Rectified Linear Unit (ReLU) activation function is applied.
-
Pooling Layer 1: This layer uses a max-pooling technique with a kernel size of 2 and stride 2. This step effectively down-samples the feature maps, reducing their dimensions to 16 x 14 x 14.
-
Convolution Layer 2: The second convolution layer accepts 16 input channels and outputs 32 channels, implying that it extracts 32 feature maps. Again, a 3x3 kernel with stride 1 and padding of size 1 is used. The output size at this layer is 32 x 14 x 14.
-
Activation Layer 2: A ReLU activation function is used after the second convolution layer.
-
Pooling Layer 2: Another max-pooling layer is used here, with the same kernel size and stride as before, reducing the feature map dimensions to 32 x 7 x 7.
-
Fully Connected Layers: Your CNN has three fully connected layers:
- First Fully Connected Layer: A flattened version of the feature maps is passed into this layer, which contains 1568 nodes.
- Second Fully Connected Layer: This layer contains 512 nodes.
- Third Fully Connected Layer: This layer has 256 nodes and leads to the final output layer.
- Output Layer: The final output layer has 10 nodes, corresponding to the 10 class labels.
-
Test Accuracy: The test accuracy of your CNN, with a dropout probability of 20%, is 99.15%%.