/Road-to-Datascience-Step1

In this project i will discuss the techniques involved in Data Preprocessing

Primary LanguageJupyter Notebook

Road-to-DataScience-ML-Step1---DataCleaning-Tidying

In this project i will discuss the techniques involved in Data Preprocessing

Let’s see what Data Preprocessing is about:

Before applying any Prediction Machine Learning Algorithm we need to do the preprocessing of data to improve the accuracy. In Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors applying our algorithms on it will result in Decrease of inaccuracy. Data preprocessing is a proven method of resolving such issues. Data preprocessing is a Process Of Detecting And Correcting (Or Removing) Corrupt Or Inaccurate Records From A Record Set, Table, OR database And Refers To Identifying Incomplete, Incorrect, Inaccurate Or Irrelevant Parts Of The Data And Then Replacing, Modifying, Or Deleting The Dirty Or Coarse Data.

Check this Steps Involved in The Machine Learning Process:

https://hackernoon.com/what-steps-should-one-take-while-doing-data-preprocessing-502c993e1caa

Softwares Needed: mostly i will be uploading my jupyter Notebooks and python codes, so its better to install:

  1. Anaconda. Pycharm.

Things I will discuss in this Repo:

Step 1. basiccmds.ipynb: Working on some Basic Commands that will be useful while dealing with datasets.

Step2. datacleaning(part1): Finding Missing values and Noisy data. Filling the missing values using some basic python commands like: Mean, Median, Mode, this implementation is all done using jupyter notebook.

step3. datatidying : in this part we shall see major rules that we need to follow while tidying the dataset

step4. datacleaning using various techniques: this contains a series of codes that are used to fill missing values using PYTHON. 4.1: Filling Missing Values using MEAN. 4.2: Filling Missing Values using MEDIAN. 4.3: Filling Missing Values using MODE. 4.4: Filling Missing Values using K-Nearest Neighbors. 4.5: Filling Missing Values using Multivariate imputation. 4.6: Filling Missing Values using Deep Learning.