This project is carried out as a project in data science class to practice following concepts,
- Sampling/ Bias
- Hypothesis testing
- Visualization
- Other useful data manupulation.
The Panel Study of Income Dynamics (PSID) dataset contains information about 4856 people. It contains their age, education, earnings, hours, number of kids and their marital status. We are trying to analyze whether the number of hours a person work has an impact on his/her earnings.
This is the jupyter notebook, which contains all the work done describing all the steps.
- Data loading.
- Data describing.
- Data visualization.
- Mising value handling.
- Invalid data removal.
- Correlation of variable.
- Setting hypothesis.
- Prove data is normally distributed.
- Random sampling.
- Hypothesis testing using ttest/ p_values.
- Executive/ Summary and Detail reporting.