This package gives very different results to krippendorffsalpha
Opened this issue · 0 comments
Thanks for creating this. I am going to be testing inter-rater reliability for a classification system, and I wanted to establish how many samples I would need to compare to get a reasonable confidence interval.
I ran some simulations and the confidence interval does not appear to be a function of the number of samples.
This was surprising so I compared it with the confidence intervals generated by the krippendorffsalpha package, which as expected decreased as the number of samples increased:
The only way that I could get the confidence intervals to change with this package was to change the balance of classes. Highly imbalanced data has wider confidence intervals (which intuitively makes sense as there are fewer examples of one class).
Anyway. I can share the code that I used to generate the data if that is helpful. Before I do I wanted to check whether what I have found was an expected feature of this package? I would have expected pretty similar results - but perhaps I am misunderstanding what this function does and I am not comparing like with like?
Sam