Question 1
How many responders are there? Are there any missing values in any of the variables?
What is the lowest salary and highest salary in the group?
What is the mean salary for the sample? Include the standard error of the mean.
What is the standard deviation for the years worked?
What is the median salary for the sample?
What is the interquartile range for salary in the sample?
How many men are there in the sample? How many women are there in the sample? Present this information in a table.
How many women are executives compared to men?
Create a histogram for the variable Salary.
Examine the histogram and describe the distribution for Salary.
Create a bar graph to show the different average salaries of men and women. (Bonus: Add error bars to the bars showing the 95% confidence interval). What does the graph tell you about the difference between men and women’s salaries?
Create a scatterplot showing the relationship between Years Worked and Salary (don’t forget to insert a trend line). What is the relationship between Years Worked and Salary?
Describe any patterns in the scatterplot. Do you notice any unusual/extreme values that do not fit the general trend? If you see any unusual values, briefly describe them (Who are they? In what way are they different?)
Using the pearsonr function from the scipy.stats package, calculate the Pearson correlation coefficient (and its corresponding p value) to determine the nature of the relationship between Years Worked and Salary. See help(pearsonr) for help on this function.
Interpret the size and direction of the correlation statistic.
Is the relationship statistically significant? Report the appropriate statistic(s) to support your answer.