/Machine-Learning-From-the-begining

In this repository, You will find the documentations on a daily basis on Machine Learning

Machine Learning Full Course

** TABLE OF CONTENTS**

Days Topic
1 Supervised Learning
2 Unsupervised Learning
3 Cost Function
4 Gradient Descent Algorithm
5 Multiple Linear Regression
6 Feature Scaling with Gradient Descent
7 Classification with Logistic Regression

Day 1

Introduction:

Arthur Samuel popularized the term Machine Learning. Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. It is a branch of artificial intelligence (AI) and computer science that focuses on the use of data and algorithms to imitate the way humans learn, gradually improving their accuracy.

There are two main types of Machine Learning algorithms. They are:

  1. Supervised learning
  2. Unsupervised learning

The other types of machine learning algorithms are:

  • Recommender system
  • Reinforcement learning

1. Supervised Learning

Supervised learning is a type of machine learning algorithm where the model learns from labeled data or right answer. In supervised learning, the algorithm is trained on a dataset that consists of input-output pairs, where the input data is accompanied by corresponding correct output labels. The goal of supervised learning is to learn a mapping function from input to output so that it can predict the output for new, unseen input data.

There are two type of supervised learning which are: i. Regression - Housing Price Prediction ii. Classification - Finding out if lump is benign or malignant

Difference between Regression and Classification

a. - Regression tries to predict any number from infinitely mant no of possible numbers.

  • Classification Only tries to predict either 0 or 1

b. - Classification: Infinitely large no of possible output

  • Regression: Predict categories of small no of possible outputs

Day 2

2. Unsupervised Learning

Unsupervised learning is a type of machine learning algorithm that models its data from the unlabelled data and finds something interesting in it. Unsupervised learning might decide to make group or cluster as we aren't supposed to give right answer and angorithm has to find structure in the data.

The few examples of unsupervised learning algorithm are:

  • Clustering Google News,
  • Clustering DNA microarray,
  • Clustering or Grouping Customers
  • Clustering similar kind of data

Day 3

Cost Function

Cost Function is a mathematical function that quantifies the difference between the predicted values of a model and the actual observed values. Cost Function in machine learning contains parameters w and b. While performing with the large number of training data, the cost function might get higher as a result a recipocal of 2m is taken. The goal is to get minimum cost function to measures how well the model's predictions align with the actual target values.

When the cost function is minimum or close to zero, it means that the model fits better than the other choices for parameters w and b.

Day 4

Gradient Descent Algorithm

Today, I learned about the gradient descent algorithm, which plays a significant role in decreasing the cost function. In this algorithm, the value of 'w' is changed with respect to the derivative of the cost function or the learning rate. The learning rate is typically taken positive and in between 0 and 1. When we use a very low learning rate, the descent steps will be very slow, taking only baby steps. However, using too high a learning rate number can lead to overshooting the gradient descent and failing to converge.

Note

When updating 'w' and 'b' simultaneously, update 'w' and 'b' first in a temporary variable, and then copy the temporary variable to 'w' and 'b'. Otherwise, the changed value of 'w' will be in the new updating 'b', leading to incorrect final output values.

Day 5

Multiple Linear Regression using vectorization

Multiple Linear Regression is the method of perfoming gradient descent of dataset having multiple features or variables to make a prediction. For example: Suppose, there are other pieces of information, such as the area of land, number of bedrooms, floors, and age of the home, needed to predict the price of the home.

Gradient Descent can be find from various method.

  • Simple method i.e adding individual list of array

  • Using For loop and add individual list of array

  • Using mathematical summation function

  • Using vectorization

    Methods can be displays as:

Among among methods, we can use the vectorization method, which performs faster as it utilizes the Python library NumPy for the calculations.

It can be show as:

There are other alternatives to gradient descent called the Normal Equation, which is suitable only for linear equations. It solves for 'w' and 'b' without iteration but becomes slow when the number of features is large (> 10,000)

Day 6

Feature Scaling for Gradient Descent

Feature scaling is a method used to normalize the range of independent variables or features of data. Suppose, if you have multiple independent variables like age, salary, and height with the range of (18-60years), (NRS.10,000- to NRS.2,50,000) and (0.5m to 2m), then feature scaling will help them to be in same range (especially 0 - 1).

Feature Scaling is importance for the following reasons:

  • It helps to navigate the direct path to the global minimum.
  • Imagine the cost function like a landscape with hills and valleys. Feature scaling creates a smoother landscape for gradient descent to navigate. This smoother landscape allows the algorithm to take larger steps towards the minimum, making the optimization process more efficient.
  • Equal footing of the features, so that other features with large magnitude differences do not overshadow the smaller features.

Following are the methods that can be utilized to perform scaling:

  • Feature Scaling
  • Mean Normalization
  • Z-score Normalization

We can identify the problems of gradient descent by the following methods:

Day 7

Classification with Logistic Regression

Today, I introduced myself with simoid function or logistic function and logistic regression and then performed classification using linear and non-linear decision boundaries. The reason why we used logistic regression instead of linear regression for classification is, linear regression does not work properly for classification in every aspect where logistic regression can do most often.

We can categorises the decision boundaries in two types. Type are:

  • Linear Decision Boundaries
  • Non-Linear Decision Boundaries

Day 8

Cost functin for Logistic Regression

The graph of the Mean squared error function is non-convex for logistic regression. As we are putting dependent variable x in a non-linear sigmoid function. As discussed above gradient descent does not work for non-convex functions, logistic regression model would never be able to converge to optimal values.

Loss Function

Loss Function is used to achieve the convex function to achieve 'Global Minimum' for which we have to use loss function. The use of log in loss function will get to convex function where we can find global minimum. The intersection point made by log and negative log function will make a convex curve as logistic regression ranges from 0 to 1.

**Here is a convex function when y = 1 **

Note

For y = 0, we use -log function and the curve started from 0 and goes to infinity. The further prediction f(x) is from target y, higher the loss.

The Logistic Loss Function

The logistic loss function can be written in a single line in the following way:

When we used this simplified formula then, when we introduced y = 1 then second part becomes zero as 1-1 = 0 as a result we have first part when y = 1 and when y = 0, first part becomes zero and get the desired value.

The cost function for Logistic Regression

After substituting the loss function, cost function for logistic regression becomes:

Day 9

Gradient Descent for Logistic Regression

Day 10

Lab