gapr is, at the moment, a small package designed to estimate distributional gaps in terms of an effect sizes, along with the associated standard error, using only count data from each distribution. This assumes that the counts are from ordered bins. The ECDF from each distribution is first approximated using these counts (cumulative proportion within each bin). The ECDF’s are then paired and a smoothed curve is estimated using maximum likelihood. The area under the curve is then estimated and transformed to an effect size. The variance-covariance matrix associated with the curve fit is also used to estimate the standard error of the effect size. For more information, please see Ho & Reardon (2012) and Reardon & Ho (2015).
gapr is currently only available on GitHub
# If not previously installed, first install the remotes package
# install.packages("remotes")
remotes::install_github("datalorax/gapr")
Estimate an achievement gap for a given school as follows
library(gapr)
ashland_middle_g6 <- oregon_schools[1:4, ]
ashland_middle_g6
#> # A tibble: 4 x 7
#> academic_year district school grade_level level hispanic_latino white
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 2014-2015 Ashland S… Ashland Midd… Grade 6 1 3 5
#> 2 2014-2015 Ashland S… Ashland Midd… Grade 6 2 2 13
#> 3 2014-2015 Ashland S… Ashland Midd… Grade 6 3 3 71
#> 4 2014-2015 Ashland S… Ashland Midd… Grade 6 4 5 49
The above represents the number of students scoring in each of 4 ordered proficiency categories on the Oregon statewide test of English/Language Arts in 2014-15 in Ashland Middle School for Grade 6, reported separately by whether the student was coded as Hispanic/Latino or White.
We can estimate the difference between these student groups using the
estimate_v
function, and the resulting estimate (representing an
effect size) effectively recovers the distributional differences as if
we had the full student-level data.
estimate_v(ashland_middle_g6, "white", "hispanic_latino")
#> auc v v_se
#> 1 0.4490952 -0.1809452 0.1752831
So, in this grade, at this school, students coded Hispanic/Latino scored, on average, approximately 0.18 standard deviations below students coded White.
Currently, the function only estimates for one school at a time, but
future developments will include by
arguments to estimate gaps by
another variable (e.g., schools, districts).
Please note that the gapr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.