A brief history of topics I covered to learn about CNNs
CNN is a class of Deep Learning Neural Networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks. (SIANN)
- Image and Video Recognition
- Recommender System
- Image Classification
- Medical Image Analysis
- Natural Language Processing
- Financial Time Series Analysis
Some important topics that need to be covered:
Hadamard product is the element wise product of two matrices. It is used during kernel calculations.
Covariate shift refers to the change in the distribution of the input variables present in the training and the test data.
- Preprocessing: This step involves imputing all missing values and label encoding of all categorical variables.
- Creating a random sample of your training and test data separately and adding a new feature origin which has value train or test depending on whether the observation comes from the training dataset or the test dataset.
- Now combine these random samples into a single dataset. Note that the shape of both the samples of training and test dataset should be nearly equal, otherwise it can be a case of an unbalanced dataset.
- Now create a model taking one feature at a time while having ‘origin’ as the target variable on a part of the dataset (say ~75%).
- Now predict on the rest part(~25%) of the dataset and calculate the value of AUC-ROC.
- Now if the value of AUC-ROC for a particular feature is greater than 0.80, we classify that feature as drifting.
There are two ways to treat Covariate Shift in Data[2]:
- Dropping of drifting features
- Importance weight using Density Ratio Estimation
During the training stage of networks, as the parameters of the preceding layers change, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to constantly readjust to new distributions. This problem is especially severe for deep networks, because small changes in shallower hidden layers will be amplified as they propagate within the network, resulting in significant shift in deeper hidden layers.[1]
Introduce Batch Normalization to reduce these unwanted shifts to speed up training and to produce more reliable models.
- Batch Normalization avoids internal covariate shifts.
- The network can use higher learning rate without vanishing or exploding gradients.
- It seems to have a regularizing effect such that the network improves its generalization properties, and it is thus unnecessary to use dropout to mitigate overfitting.
- It has been observed also that with batch norm the network becomes more robust to different initialization schemes and learning rates.