This GitHub repository contains a comprehensive analysis of an employee salaries dataset. The dataset provides information about the employees of an organization, including their compensation, job titles, and other relevant details.
- 'Id': Employee identification number
- 'EmployeeName': Name of the employee
- 'JobTitle': Job title of the employee
- 'BasePay': Base salary of the employee
- 'OvertimePay': Overtime pay received by the employee
- 'OtherPay': Additional pay or bonuses
- 'Benefits': Employee benefits
- 'TotalPay': Total salary (sum of BasePay, OvertimePay, and OtherPay)
- 'TotalPayBenefits': Total compensation including benefits
- 'Year': Year of the recorded data
- 'Notes': Additional notes (if any)
- 'Agency': Organization or agency name
- 'Status': Employment status
- Identify the number of rows and columns in the dataset.
- Determine the data types of each column.
- Check for missing values in each column.
- Calculate basic statistics such as mean, median, mode, minimum, and maximum salary.
- Determine the range of salaries.
- Find the standard deviation of salaries.
- Handle missing data using suitable methods, with an explanation of the chosen approach.
- Create histograms or bar charts to visualize the distribution of salaries.
- Use pie charts to represent the proportion of employees in different departments.
- Group the data by one or more columns.
- Calculate summary statistics for each group.
- Compare average salaries across different groups.
- Identify any correlation between salary and another numerical column.
- Plot a scatter plot to visualize the relationship.
- Write a brief report summarizing the findings and insights from the analyses.