CamDavidsonPilon/tdigest

Negative quantile approximation with high skew/less data

shardulbee opened this issue · 4 comments

digest = TDigest()
digest.batch_update([62.0, 202.0, 1415.0, 1433.0])
digest.percentile(0.25)

Returns -136.25. This is because in https://github.com/CamDavidsonPilon/tdigest/blob/master/tdigest/tdigest.py#L166-L167, delta is computed as the mean of the means of the neighbouring centroids and is used as the slope to linearly approximate the quantile between the two centroids. In the following line, m_i + ((p - t) / k - 1/2)*delta is negative because delta is very large, and p - t = 0 and thus the expression evaluates to m_i + (-1/2)*delta which is negative.

Can you confirm your version of tdigest? I believe this is fixed in 0.4+ (atleast I can't repro it on master)

SHSE commented

Still reproducible.

digest = TDigest()
digest.batch_update([62.0, 202.0, 1415.0, 1433.0])
digest.percentile(25) # returns -136.25

@SHSE try 0.25 instead of 25.

EDIT: nvm, still investigating.

Corrected in #31