city-school-district: A Jupyter Notebook repository from Jayplect

Description

The purpose of this project was to analyze my city's school district in order to help the school board and mayor make strategic decisions regarding future school budgets and priorities. To achieve this, I created and manipulated Pandas DataFrames to analyze the school and standardized test data.

Dependencies used

Summary of Dataset

There are two sets of data are used for this project. One of the the datasets showed the student details including Math and Reading grades while the other data set contained the schools and budget information. The data used in this analysis were simulated for the purpose of this project.

Project Steps

Step 1: Merging both data sets

The first step following series of other steps was to load both data sets and then using a left join from pandas merge both data sets on school name.

Load files using path from pathlib

school_data_to_load = Path("Resources/schools_complete.csv")
student_data_to_load = Path("Resources/students_complete.csv")

Read School and Student Data File and store into Pandas DataFrames

school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

Combine the data into a single dataset

school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

Step 2: Summary Statistics

As a next step, I created a high level snapshot (in table form) of the district and school using these key metrics below.

Summary of metrics:
- Total Schools
- Total Students
- Total Budget
- Average Math Score
- Average Reading Score
- % Passing Math (The percentage of students that passed math)
- % Passing Reading (The percentage of students that passed reading)
- % Overall Passing (The percentage of students that passed math and reading)
A table that highlights the top and bottom performing schools based on % Overall Passing was also created using the metrics below:
- School Name
- School Type
- Total Students
- Total School Budget
- Per Student Budget
- Average Math Score
- Average Reading Score
- % Passing Math (The percentage of students that passed math)
- % Passing Reading (The percentage of students that passed reading)
- % Overall Passing (The percentage of students that passed math and reading)

Step 3: Binning

To establish a relationship between school performance and average spending ranges (per student) and school size (small, medium, large) with the use 4 reasonable bins to group school spending. The Pandas function cut makes this easy to achieve.

Scores by school spending

Establish the bins

spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

Use pd.cut to categorize spending based on the bins.

school_spending_df["Spending Ranges (Per Student)"] = pd.cut(school_spending_df["Per Student Budget"], bins=spending_bins, labels=labels)

Scores by school size

Establish the bins.

size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

Use pd.cut on the "Total Students" column of the per_school_summary DataFrame.

per_school_summary["School Size"] = pd.cut(per_school_summary["Total Students"], bins=size_bins,labels=labels)

Summary of analysis

As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending 645-675 per student actually underperformed compared to schools with smaller budgets (585 per student).
As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).
As a whole, charter schools out-performed the public district schools across all metrics due to the fact that charter schools tend to serve smaller student populations per school.

References

Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.

Jayplect/city-school-district