Data analysis refers to a process of processing given data to obtain the desired information and conclusions.
Data analysis usually consists of the following steps.
- Topic selection
- Data structure identification
- Data preprocessing
- Data analysis implementation
Set the purpose of data analysis, such as which data to select, what hypotheses to make from the data, and to start the analysis, and what conclusions you want.
In order to analyze the data, it is necessary to know in advance the type, data type, and variable name where the data is stored. Or, by applying a statistical function to a data frame, you can determine the distribution or propensity of the data.
Before data is analyzed, only necessary variables are extracted or new variables are calculated using existing variables. If there are missing and outliers in the data, you must remove them correctly at this stage to properly verify the data analysis results.
This is a step to implement hypothesis or obtain desired information by calculating and processing data using numpy and pandas based on the hypothesis established in the topic selection stage. Visualization is also used to effectively show the information obtained.