Review binning logic for Grade Distribution to handle edge cases not caught by current logic
Opened this issue · 0 comments
Describe your problem or feature you'd like added
Grade Distribution binning logic should be revisited to handle an edge case distribution with specific properties:
- The difference between the 5th lowest grade and each the grades above is less than 2
- The difference between the first 4 lowest grades and the 5th grade is more than 2
The current binning logic looks at the 5th lowest grade and the grades above to determine whether to bin all the lowest grades together. However, there may be cases where the lowest 4 grades should be binned for student privacy, but the 5th lowest fits with the higher grades in the histogram bins.
Example data:
(96.0, 96.0, 96.0, 96.0, 98.0, 98.0, 98.5, 98.5, 99, 99, 99.99)
The logic looks at the 5th lowest (98) and checks to see if it is in the same bin as each subsequent grade by checking for a different >2. In this case, all of the remaining grades (98 through 99.99) would all fall in the same bin. From there, the logic determines that since everything would fall in one bin for privacy. In the case where the logic determines to bin all, we don't use the dotted line or group lowest 4 grades together. Instead the distribution looks like the one below:
This becomes more of a problem if the lowest 4 grades are significantly lower than the 5th and above.
Describe the solution you'd like
We need a way to protect the privacy of lower performing students when this edge case occurs. This may require a significant rewrite to the privacy binning logic to use a statistical analysis formula rather than the approach we are currently using.