gapr

gapr is, at the moment, a small package designed to estimate distributional gaps in terms of an effect sizes, along with the associated standard error, using only count data from each distribution. This assumes that the counts are from ordered bins. The ECDF from each distribution is first approximated using these counts (cumulative proportion within each bin). The ECDF’s are then paired and a smoothed curve is estimated using maximum likelihood. The area under the curve is then estimated and transformed to an effect size. The variance-covariance matrix associated with the curve fit is also used to estimate the standard error of the effect size. For more information, please see Ho & Reardon (2012) and Reardon & Ho (2015).

Installation

gapr is currently only available on GitHub

# If not previously installed, first install the remotes package
# install.packages("remotes")

remotes::install_github("datalorax/gapr")

Example

Estimate an achievement gap for a given school as follows

library(gapr)
ashland_middle_g6 <- oregon_schools[1:4, ]
ashland_middle_g6
#> # A tibble: 4 x 7
#>   academic_year district   school        grade_level level hispanic_latino white
#>   <chr>         <chr>      <chr>         <chr>       <dbl>           <dbl> <dbl>
#> 1 2014-2015     Ashland S… Ashland Midd… Grade 6         1               3     5
#> 2 2014-2015     Ashland S… Ashland Midd… Grade 6         2               2    13
#> 3 2014-2015     Ashland S… Ashland Midd… Grade 6         3               3    71
#> 4 2014-2015     Ashland S… Ashland Midd… Grade 6         4               5    49

The above represents the number of students scoring in each of 4 ordered proficiency categories on the Oregon statewide test of English/Language Arts in 2014-15 in Ashland Middle School for Grade 6, reported separately by whether the student was coded as Hispanic/Latino or White.

We can estimate the difference between these student groups using the estimate_v function, and the resulting estimate (representing an effect size) effectively recovers the distributional differences as if we had the full student-level data.

estimate_v(ashland_middle_g6, "white", "hispanic_latino")
#>         auc          v      v_se
#> 1 0.4490952 -0.1809452 0.1752831

So, in this grade, at this school, students coded Hispanic/Latino scored, on average, approximately 0.18 standard deviations below students coded White.

Currently, the function only estimates for one school at a time, but future developments will include by arguments to estimate gaps by another variable (e.g., schools, districts).

Code of Conduct

Please note that the gapr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.