The purpose of this project was to analyze my city's school district in order to help the school board and mayor make strategic decisions regarding future school budgets and priorities. To achieve this, I created and manipulated Pandas DataFrames to analyze the school and standardized test data.
There are two sets of data are used for this project. One of the the datasets showed the student details including Math and Reading grades while the other data set contained the schools and budget information. The data used in this analysis were simulated for the purpose of this project.
The first step following series of other steps was to load both data sets and then using a left join from pandas merge both data sets on school name.
Load files using path from pathlib
school_data_to_load = Path("Resources/schools_complete.csv")
student_data_to_load = Path("Resources/students_complete.csv")
Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)
Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])
As a next step, I created a high level snapshot (in table form) of the district and school using these key metrics below.
-
Summary of metrics:
- Total Schools
- Total Students
- Total Budget
- Average Math Score
- Average Reading Score
- % Passing Math (The percentage of students that passed math)
- % Passing Reading (The percentage of students that passed reading)
- % Overall Passing (The percentage of students that passed math and reading)
-
A table that highlights the top and bottom performing schools based on % Overall Passing was also created using the metrics below:
- School Name
- School Type
- Total Students
- Total School Budget
- Per Student Budget
- Average Math Score
- Average Reading Score
- % Passing Math (The percentage of students that passed math)
- % Passing Reading (The percentage of students that passed reading)
- % Overall Passing (The percentage of students that passed math and reading)
To establish a relationship between school performance and average spending ranges (per student) and school size (small, medium, large) with the use 4 reasonable bins to group school spending. The Pandas function cut makes this easy to achieve.
-
Establish the bins
spending_bins = [0, 585, 630, 645, 680] labels = ["<$585", "$585-630", "$630-645", "$645-680"]
Use
pd.cut
to categorize spending based on the bins.school_spending_df["Spending Ranges (Per Student)"] = pd.cut(school_spending_df["Per Student Budget"], bins=spending_bins, labels=labels)
-
Establish the bins.
size_bins = [0, 1000, 2000, 5000] labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]
Use
pd.cut
on the "Total Students" column of theper_school_summary
DataFrame.per_school_summary["School Size"] = pd.cut(per_school_summary["Total Students"], bins=size_bins,labels=labels)
-
As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending 645-675 per student actually underperformed compared to schools with smaller budgets (585 per student).
-
As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).
-
As a whole, charter schools out-performed the public district schools across all metrics due to the fact that charter schools tend to serve smaller student populations per school.
Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.