In this project I run multiple code as an example to full-fill the following issues:
- Load the data of csv file
- Check is data alright or not. If not then check the following points to make the data corrent before purforming the visualization:
- Check and remove NaN value or special char in the data
- Check is have correct datatypes in all columsn values or not, if not then make all data in same datatype.
- Check is data correct means not NaN or special char measn the data is right like is 199 or 1.99 if the data is not right then remove that data means the row.
- Check outliner data, like is any data out off the range like a exceptional case whcih need to remove.
These are the following basic things which need to do before performaing any activity on the data. Because as an Data Analyst or Data Scientist it first priority or thing to do with the data, so that the data will be correct and without any issues.
As per Pandas these are following steps to do:
- pandas series
- pandas dataframes
- read csv
- read json
- analyzer data
- cleaning data
- cleaning empty cell
- clean wrong formate
- clean wrong data
- remove duplicates
- correlations
- plotting
I am going to follow these steps one by one and do the following things.
Note: For running
Jupyter Notebookcode also need to install aipykernalinvscode.