Using pandas to analyze school and student performance within a district
Jupyter Notebook
Observed Trends:
Looking at all the reading scores and math scores in each table, students tend to do better in reading than in math.
Looking at the top performing schools, bottom performing schools, and passing rates by school type, students have a much higher passing rate in charter schools (~95%) than in district schools (~74%).
Looking at the summary of performance by spending ranges per student, it seems that students have a higher passing rate in the lower spending ranges (<$615).
# Import dependenciesimportpandasaspdimportnumpyasnp# Create paths to csv filesschool_path="raw_data/schools_complete.csv"student_path="raw_data/students_complete.csv"# Read the csv filesschools=pd.read_csv(school_path)
schools=schools.rename(columns={"name": "school"}) # so we can merge laterstudents=pd.read_csv(student_path)
students=pd.DataFrame(students)
District Summary
# Calculate total schools, students, and budget of districttotal_schools=schools['School ID'].count()
total_students=schools['size'].sum()
total_budget=schools['budget'].sum()
# Calculate average math and reading scoresavg_math=students['math_score'].mean()
avg_read=students['reading_score'].mean()
# Calculate percent passing ratespass_math=students.loc[students['math_score'] >=70]
per_math=len(pass_math)/total_students*100pass_read=students.loc[students['reading_score'] >=70]
per_read=len(pass_read)/total_students*100overall= (per_math+per_read)/2# Create dataframe for district summarydistrict_summary=pd.DataFrame({"Total Schools": total_schools,
"Total Students": total_students,
"Total Budget": total_budget,
"Average Math Score": avg_math,
"Average Reading Score": avg_read,
"% Passing Math": per_math,
"% Passing Reading": per_read,
"% Overall Passing Rate": [overall]})
#Reorganize columns with double bracketsdistrict_summary=district_summary[["Total Schools", "Total Students", "Total Budget",
"Average Math Score", "Average Reading Score",
"% Passing Math", "% Passing Reading", "% Overall Passing Rate"]]
# Format to "$0,000.00"district_summary['Total Budget'] =district_summary['Total Budget'].map('${:,.2f}'.format)
district_summary
Total Schools
Total Students
Total Budget
Average Math Score
Average Reading Score
% Passing Math
% Passing Reading
% Overall Passing Rate
0
15
39170
$24,649,428.00
78.985371
81.87784
74.980853
85.805463
80.393158
School Summary
# Merge schools and studentsmerge_df=pd.merge(schools, students, on="school")
merge_df=merge_df.drop(['School ID', 'Student ID'], axis=1)
# Set school as index, get type, total students, total budget and per student budgetschool_index=schools.set_index('school')
school_type=school_index['type']
school_students=school_index['size']
school_budget=school_index['budget']
psb=school_budget/school_students# Set school as index and groupby school for merge_df to get average scores by schoolindex=merge_df.set_index('school')
grouped=index.groupby(['school'])
# Get average math and reading scoreschool_avg_math=grouped['math_score'].mean()
school_avg_read=grouped['reading_score'].mean()
# Calculate percent that passed mathtotal_stu=grouped['name'].count()
grouped_math=pass_math.groupby('school')
school_pass_math=grouped_math['name'].count()/total_stu*100# Calculate percent that passed readinggrouped_reading=pass_read.groupby('school')
school_pass_read=grouped_reading['name'].count()/total_stu*100# Calculate overall passing rateoverall_pass= (school_pass_math+school_pass_read)/2# Create dataframe for school summaryschool_summary=pd.DataFrame({"School Type": school_type,
"Total Students": school_students,
"Total School Budget": school_budget,
"Per Student Budget": psb,
"Average Math Score": school_avg_math,
"Average Reading Score": school_avg_read,
"% Passing Math": school_pass_math,
"% Passing Reading": school_pass_read,
"Overall Passing Rate": overall_pass})
# Reorganize school summary columns school_summary=school_summary[["School Type", "Total Students", "Total School Budget",
"Per Student Budget", "Average Math Score", "Average Reading Score",
"% Passing Math", "% Passing Reading", "Overall Passing Rate"]]
# Format to "$0,000.00"school_summary['Total School Budget'] =school_summary['Total School Budget'].map('${:,.2f}'.format)
school_summary['Per Student Budget'] =school_summary['Per Student Budget'].map('${:,.2f}'.format)
school_summary.head()
School Type
Total Students
Total School Budget
Per Student Budget
Average Math Score
Average Reading Score
% Passing Math
% Passing Reading
Overall Passing Rate
Bailey High School
District
4976
$3,124,928.00
$628.00
77.048432
81.033963
66.680064
81.933280
74.306672
Cabrera High School
Charter
1858
$1,081,356.00
$582.00
83.061895
83.975780
94.133477
97.039828
95.586652
Figueroa High School
District
2949
$1,884,411.00
$639.00
76.711767
81.158020
65.988471
80.739234
73.363852
Ford High School
District
2739
$1,763,916.00
$644.00
77.102592
80.746258
68.309602
79.299014
73.804308
Griffin High School
Charter
1468
$917,500.00
$625.00
83.351499
83.816757
93.392371
97.138965
95.265668
Top Performing Schools (By Passing Rate)
# Create top performing schools summary by passing ratetop_summary=school_summary.loc[school_summary['Overall Passing Rate'] >90]
top_summary.sort_values(['Overall Passing Rate'], ascending=False).head()