/hypothesis-testing

This repo contains the details of hypothesis testing done via python in the class Data Science

Primary LanguageJupyter Notebook

Hypothesis testing basic example is presented

This project is carried out as a project in data science class to practice following concepts,

  1. Sampling/ Bias
  2. Hypothesis testing
  3. Visualization
  4. Other useful data manupulation.

Data-set (PSID.csv file)

The Panel Study of Income Dynamics (PSID) dataset contains information about 4856 people. It contains their age, education, earnings, hours, number of kids and their marital status. We are trying to analyze whether the number of hours a person work has an impact on his/her earnings.

Implementation (ds project.ipynb)

This is the jupyter notebook, which contains all the work done describing all the steps.

Following steps are taken,

  1. Data loading.
  2. Data describing.
  3. Data visualization.
  4. Mising value handling.
  5. Invalid data removal.
  6. Correlation of variable.
  7. Setting hypothesis.
  8. Prove data is normally distributed.
  9. Random sampling.
  10. Hypothesis testing using ttest/ p_values.
  11. Executive/ Summary and Detail reporting.