Machine-Learning-from-Scratch: A Python repository from KritikPant

This is a series of scripts written while following the Machine Learning from Scratch Tutorials available on the channel named Python Engineer and available at https://www.youtube.com/playlist?list=PLqnslRFeH2Upcrywf-u2etjdxxkL8nl7E. n this tutorial, the instructor teaches how to implement popular machine learning algorithms while only using python and numpy without the use of additional libraries.

What I learnt:

K Nearest Neigbours:

A sample is classified by a popularity vote of its nearest neighbours. i.e. If k=3 and 2 of the 3 nearest points on a graph belong to the same class then the sample will be labelled as part of tht class.
For this to work, training samples must be provided (i.e. multiple different classes plotted on the graph)
In order to calculate distances we use Euclidiean Distance

Linear Regression:

ŷ = wx + b (w = weights, b = bias)
To find weights and biases, a cost function is used:

Since this is the error, we want to minimize this so we need to find the minimum of this function. To do this we need to find the derivative:

This calculates the gradient of the cost function with respect to w and respect to b
Now we use gradient descent which is an iterative technique to find the minimum point:

"So we have some initialization of the weights and the bias and then we want to go into the direction of the steepest descent and the steepest descent is also the gradient so we want to go into the direction of the into the negative direction of the gradient and we do this iteratively until we finally reached the minimum"
To do this iteratively, we need some update rules:

The learning rate is a very important parameter as a small learning rate might be slower but more accurate but a large learning rate might be faster but at the same time never find the minimum point.

Logistic Regression:

"In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc"
In linear regression, we use the formula f(w,b) = wx + b which outputs continuous values. To change this into a probability, we use the sigmoid function:

The approximations are as follows due to applying the sigmoid function f(w,b) = wx + b (w = weights, b = bias):

This will output a probability between 0 and 1
This is the cost function we use:

To optimize this formula, we use gradient descent again. These are the update rules and derivatives for the logistic regression algorithm:

Naive Bayes Classifier

Based on the Bayes Theorem which states If we have two events A and B then the probability of event A given that B has already happened is equal to the probability of B given that A has happened times the probability of A divided by the probability of B:

In our case, we use this like so:

We then use the chain rule to get the following:

Terminology:
- P(y|X) is called the posterior probability
- P(X|y) is called the class conditional probability
- P(y) is called the prior probability of Y
- P(X) is called the prior probability of X
It is called Naive Bayes because it assumes that all features (factors contributing to overall probability) are mutually independent which is unlikely in the real world
"For example if you want to predict the probability that a person is going out for a run given the feature that the sun is shining and also given the feature that the person is healthy, then both of these features might be independent but both contribute to this probability that the person goes out. In real life a lot of features are not mutually independent but this assumption works fine for a lot of problems"
We then have to select the class with the highest probability. We can therefore use the first formula given below. However, since we are only interested in y, we can ignore P(X). We then must use logarithms to get to the third formula provided below. We do this as all the probabilities will be between 0 and 1 so the final calculation will result in a very small number which could lead to overflow errors.

In the end, P(y) = frequency
The class conditional probability is calculated as follows:

Perceptron

The perceptron can be seen as one single unit of an artificial neural network
It is a simplified model of a biological neuron and it simulates the behavior of only one cell
Inputs (weighted and summed) -> Activation function -> output
In this code, we will be using step function as our activation function.

KritikPant/Machine-Learning-from-Scratch