/FreeCodeCamp-DS

This repository is created just for learning purpose, here we are going to implement various supervised machine algorithms based on the data to predict the target variable.

Primary LanguagePython

FreeCodeCamp-DS

In this repositry we are going to implement various supervised machine algorithms based on the data to predict the target variable.

Below 7 steps We are going to follow in almost every algorithm example -

  1. Collect the data
  2. Data cleaning
  3. Analyza the data (EDA Analysis)
  4. Feature engineering
  5. Train the model (Build the model)
  6. Calculate model accuracy
  7. Calculate the LOSS funcation

Collect the data Import the data from local machine or import the data from URL etc.

Analyza the data Analyze the data once the data is availabe to understand the data. like how the data is distributed, correlation between variables, find the relation between variables using t-test, chi square test, Anova etc.

Data Cleaning In this we will see how to handle Missing values, How to deal with Outliers, remove duplicates etc. ********** Quality data beats fancy algorithms. ************

Feature engineering Find the useful variables to train the model using domain knowledge, based on previous experiance, based on various statistical methods etc.

Train Test (Build the model) Devide the dataset into two part 1. Train, 2. Test, now use Train dataset to train the model and check the same on Test dataset to see how the model is performing.

Accuracy To Check the accuracy of model how the model is performing on test data or unseen data, usually accuracy is calculated in the form of percentage.

Loss funcation The lower the loss, the better a model (unless the model has over-fitted to the training data). The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets.