/salaries_data_analysis_using_pandas

Salaries data analysis using Pandas.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

salaries_data_analysis_using_pandas

Salaries data analysis using Pandas.

Objective:

  • Import pandas as pd
  • Read Salaries.csv as a dataframe called salary
  • Check the head of the DataFrame
  • Use the .info() method to find out how many entries there are.
  • What is the average BasePay?
  • What is the highest amount of OvertimePay in the dataset ?
  • Find the job title of any particular person e.g JGARY JIMENEZ ?
  • How much does any particular person make e.g GARY JIMENEZ (including benefits)?
  • What is the name of highest paid person (including benefits)?
  • What is the name of lowest paid person (including benefits)?
  • What was the average (mean) BasePay of all employees per year? e.g (2011-2014)
  • How many unique job titles are there?
  • What are the top 5 most common jobs?
  • How many Job Titles were represented by only one person in 2013? (e.g. Job Titles with only one occurence in 2013?)
  • How many people have the word Chief in their job title?

Steps To Run:

For this demonstration, I am using the Jupyter Notebook, open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Step 1:

Create a virtual enviroment and install dependencies by running requirements.txt.

$ pip install virtualenv env

$ source env/bin/activate

$ pip install -r requirements.txt

Step 2:

Run the script.

$ jupyter notebook