Negative quantile approximation with high skew/less data
shardulbee opened this issue · 4 comments
shardulbee commented
digest = TDigest()
digest.batch_update([62.0, 202.0, 1415.0, 1433.0])
digest.percentile(0.25)
Returns -136.25
. This is because in https://github.com/CamDavidsonPilon/tdigest/blob/master/tdigest/tdigest.py#L166-L167, delta
is computed as the mean of the means of the neighbouring centroids and is used as the slope to linearly approximate the quantile between the two centroids. In the following line, m_i + ((p - t) / k - 1/2)*delta
is negative because delta
is very large, and p - t = 0
and thus the expression evaluates to m_i + (-1/2)*delta
which is negative.
CamDavidsonPilon commented
Can you confirm your version of tdigest? I believe this is fixed in 0.4+ (atleast I can't repro it on master)
SHSE commented
Still reproducible.
digest = TDigest()
digest.batch_update([62.0, 202.0, 1415.0, 1433.0])
digest.percentile(25) # returns -136.25
CamDavidsonPilon commented
@SHSE try 0.25
instead of 25
.
EDIT: nvm, still investigating.
CamDavidsonPilon commented
Corrected in #31