Step_by_Step_Data_Analysis_Auto_Dataset: A Jupyter Notebook repository from Farhad-Davaripour

Hands-on Practice Learning Lab for Data Science

Overview

This repository presents a step by step approach for data wrangling, descriptive statistical analysis, predictive analysis, model development, model evaluation, and decision making. This project is a part of Data Analysis with Python course offered by Coursera.org. The dataset includes auto info provided in the course and could be downloaded from IBM cloud.

✓ Link to the dataset: Link
✓ link to the description of each column of the dataset Link
✓ Link to the notebook: Link

In this study:

the following steps are carried out to address the missing values in the dataset, including:

replacing the missing values (? in here) with np.nan
finding the columns which include missing values and counting the number of elements with missing values
replacing the missing values with the average of the values in the column

the following steps are taken in order to prepare the data for data analysis:

normalizing the values based on the (value - average(column))/standard_deviation(column)
binning the columns into categorical groups (e.g., low, medium, and high)
Turning categorical variables into quantitative variables (e.g., 0 and 1)

statistical descriptive analyses are performed using:

Chi-Square and analysis of variance (ANOVA) methods for columns with object data type
Pearsonr method for columns with numerical data type

In_sample testing:

splitting the data into train and test data with test data include 30% of the overall data

model development is performed using:

Simple linear regression model
Multi-linear regression model
1-dimensional polynomial regression model
Multi-dimensional polynomial regression model
Ridge regression model
Grid search to find the parameter in the Ridge model (alpha) which leads to the highest R-square

model evaluation and decision making is carried out using the following statistical methods:

Mean Square Error (MSE)
R-Square
Cross validation

About The Author

Farhad Davaripour is a finite element specialist/data science enthusiast with near 3 years of experience working in research and development roles. He has a knack for problem-solving and passion for data science (He is certified with IBM Data Science Professional Certificate).
Connect with Farhad on LinkedIn.

Farhad-Davaripour/Step_by_Step_Data_Analysis_Auto_Dataset

Hands-on Practice Learning Lab for Data Science

Overview

About The Author