In this project i will discuss the techniques involved in Data Preprocessing
Before applying any Prediction Machine Learning Algorithm we need to do the preprocessing of data to improve the accuracy. In Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors applying our algorithms on it will result in Decrease of inaccuracy. Data preprocessing is a proven method of resolving such issues. Data preprocessing is a Process Of Detecting And Correcting (Or Removing) Corrupt Or Inaccurate Records From A Record Set, Table, OR database And Refers To Identifying Incomplete, Incorrect, Inaccurate Or Irrelevant Parts Of The Data And Then Replacing, Modifying, Or Deleting The Dirty Or Coarse Data.
https://hackernoon.com/what-steps-should-one-take-while-doing-data-preprocessing-502c993e1caa
Softwares Needed: mostly i will be uploading my jupyter Notebooks and python codes, so its better to install:
- Anaconda. Pycharm.
Step 1. basiccmds.ipynb: Working on some Basic Commands that will be useful while dealing with datasets.
Step2. datacleaning(part1): Finding Missing values and Noisy data. Filling the missing values using some basic python commands like: Mean, Median, Mode, this implementation is all done using jupyter notebook.
step3. datatidying : in this part we shall see major rules that we need to follow while tidying the dataset
step4. datacleaning using various techniques: this contains a series of codes that are used to fill missing values using PYTHON. 4.1: Filling Missing Values using MEAN. 4.2: Filling Missing Values using MEDIAN. 4.3: Filling Missing Values using MODE. 4.4: Filling Missing Values using K-Nearest Neighbors. 4.5: Filling Missing Values using Multivariate imputation. 4.6: Filling Missing Values using Deep Learning.