MachineLearningLinearRegression and TimeSeries

Machine Learning

Machine Learning is a field of science that enables a computer program to learn from a set of data

Machine Learning Workflow

1 Data Profiling

Data profiling is a technique used to analyze and gain a better understanding of raw data. It is the first step in determining what insights data can yield when you run it through machine learning algorithms in order to make predictions.

2 Data Cleansing

Data Cleaning means the process of identifying the incorrect, incomplete, inaccurate, irrelevant or missing part of the data and then modifying, replacing or deleting them according to the necessity.

Data cleaning is considered a foundational element of the basic data science.

3 EDA

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

4 Feature Engineering

Feature engineering is the process of selecting, manipulating, and transforming raw data into features that can be used in supervised learning. The act of converting raw observations into desired features using statistical or machine learning approaches.

Feature engineering is a machine learning technique that leverages data to create new variables that aren’t in the training set.

5 Preprocessing Modelling

  • Feature Selection
  • Feature Importance
  • train_test_split Preprocessing modeling is very important to do before making the model so that it is precise in selecting the variables to be used in making the model so that it gets a high and precise accuracy value.

6 Modelling

There are many kind of models for machine learning, but in this case we use LinearRegression

7 Evaluate Model

Model evaluation is used to evaluate or assess the results of the models that are made are good or there must be improvements by comparing the evaluation values ​​of MAE (mean_absolute_error) and MAPE (mean_absolute_percentage_error). The smaller the MAPE value, the better the model made.

Regression

  • Simple Linear Regression Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line

  • Multiple Linear Regression multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variables.

Time Series

A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series.

Cross Validation

Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data.

Hyperparameter Tuning

In machine learning, a hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training.