This project explores the San Francisco employee salary dataset to uncover insights and trends related to employee compensation.
The dataset contains information on salaries, job titles, benefits, and other attributes for San Francisco city employees.
The analysis focuses on:
- Exploring Data Distributions: Understanding the distributions of salary components like BasePay, OvertimePay, Benefits, and TotalPay.
- Job Title Analysis: Categorizing job titles into general categories for better analysis and understanding pay differences.
- Pay Trends by Year: Analyzing salary trends across different years.
- Gender Pay Gap: Investigating potential gender-based pay disparities.
- Top Earning Jobs: Identifying job groups with the highest total pay.
- Correlation Analysis: Examining correlations between different salary components.
- Predictive Modeling: Building a linear regression model to predict BasePay based on other factors.
- Classification Modeling: Applying Logistic Regression and Random Forest to classify employee gender based on salary and job features.
- Pay Variations: There are significant variations in pay across different job titles and years.
- Gender Pay Gap: Evidence suggests potential pay discrimination based on gender, with men generally earning more in most job groups.
- Top Earning Jobs: Firefighters, Police Officers, and Medical Professionals tend to have the highest total pay.
- Correlation: Strong correlations exist between salary components, highlighting the influence of overtime and other pay on total compensation.
- Predictive Modeling: The linear regression model can reasonably predict BasePay based on selected input features.
- Classification Modeling: Both Logistic Regression and Random Forest models show reasonable accuracy in classifying employee gender.
numpy
: For numerical computationpandas
: For data manipulation and analysisseaborn
: For statistical data visualizationplotly
: For interactive data visualizationgender-guesser
: For gender detectionscikit-learn
: For machine learning tasks
Install Required Libraries:
pip install numpy pandas matplotlib seaborn plotly gender-guesser scikit-learn cufflinks
This analysis provides insights into the San Francisco employee salary data, highlighting pay variations, potential gender-based pay discrepancies, and correlations between salary components. The predictive and classification models showcase the applicability of machine learning techniques to understand and potentially mitigate pay disparities.