/LA-Payroll-Data-Analysis

CSE 544 Probability and Statistics for Data Science.

Primary LanguageJupyter Notebook

CSE544project

CSE 544 Probability and Statistics for Data Science.

Hypothesis testing is an essential procedure in statistics which is used to evaluate two mutually exclusive hypothesis about a data set to determine which hypothesis is best supported by the sample data.

We have picked the LA payroll data of government employees ranging from 2013 to 2016. We are interested in finding interesting statistical answers and insights from the data. Majorly, we are interested in following hypotheses:

  1. Annual pay and hourly pay doesn't increase over the years

  2. Work of non-risky departments are stagnant and is more likely to be forecasted, but the work for risky departments are very unpredictable and hence their salary distributions in the two halves of the year varies.

  3. Health Benefits follow same distribution over career ladders.

  4. Annual salaries can be predicted with very low error after required data pre-processing

Techniques used:

  1. Two sample t-test

  2. Wald's test

  3. Permutation test

  4. KS test

  5. Linear Regression

  6. Estimator