/bqme

Python Package for Bayesian Quantile Matching Estimation

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Bayesian Quantile Matching Estimation using Order Statistics

Documentation Status Build Status Tests Status Codecoverage Status

BQME is a package that allows users to fit a distribution to observed quantile data. The package uses Order Statistics as the noise model, which is more robust than e.g. Gaussian noise model (mean squared error). The paper describing the theory can be found on arxiv: https://arxiv.org/abs/2008.06423. The notebooks for the experiments in the paper are moved to https://github.com/RSNirwan/BQME_experiments.

BQME generates stan-code that implements the matching and then uses stan's sampling and optimizing functions for posterior samples and MAP estimate, respectively.

Install

Install latest release via pip

pip install bqme

For latest development version clone the repository and install via pip

git clone https://github.com/RSNirwan/bqme
cd bqme
pip install .

Install with dev dependencies

git clone https://github.com/RSNirwan/bqme
cd bqme
pip install -e .[dev]
# pip install -e ".[dev]"  # for ZSH users

After installing dev dependencies we can run tests

# only run fast tests
python -m pytest --cov=bqme tests/ --cov-report term-missing
# run all tests (also the one marked by slow) - roughly 10 min needed
python -m pytest --cov=bqme tests/ --slow --cov-report term-missing

Usage

Here, we fit a Normal distribution to observed quantile data using order statistics of the observed quantiles. Note that the likelihood is not a Normal distribution, but the order statistics of the observed quantiles assuming the underlying distribution is a Normal.

from bqme.distributions import Normal, Gamma
from bqme.models import NormalQM

N, q, X = 100, [0.25, 0.5, 0.75], [-0.1, 0.3, 0.8]

# define priors
mu = Normal(0, 1, name='mu')
sigma = Gamma(1, 1, name='sigma')

# define likelihood
model = NormalQM(mu, sigma)

# sample the posterior
fit = model.sampling(N, q, X)

# extract posterior samples
mu_posterior = fit.mu
sigma_posterior = fit.sigma

# get stan sample object
stan_samples = fit.stan_obj

# get pdf and cdf of x_new
x_new = 1.0
pdf_x = fit.pdf(x_new)
cdf_x = fit.cdf(x_new)

# get percent point function of q_new (inverse of cdf)
# default return values are samples from posterior predictive p(x|q)
q_new = 0.2
ppf_q = fit.ppf(q_new)  

We can also look at the generated stan code and optimize the parameters (MAP) instead of sampling the posterior.

from bqme.distributions import Normal, Gamma
from bqme.models import NormalQM

mu = Normal(0, 1, name='mu')
sigma = Gamma(1, 1, name='sigma')
model = NormalQM(mu, sigma)

# print generated stan code
print(model.code)

# optimize
N, q, X = 100, [0.25, 0.5, 0.75], [-0.1, 0.3, 0.8]
fit = model.optimizing(N, q, X)

# extract optimized parameters
mu_opt = fit.mu
sigma_opt = fit.sigma

# get pdf, cdf, ppf
pdf_x = fit.pdf(1.1)
cdf_x = fit.cdf(1.1)
ppf_q = fit.ppf(0.2)

Available prior distributions and likelihoods

distributions/priors (import from bqme.distributions):

  • Normal(mu:float, sigma:float, name:str)
  • Gamma(alpha:float, beta:float, name:str)
  • Lognormal(mu:float, sigma:float, name:str)
  • Weibull(alpha:float, sigma:float, name:str)
  • InvGamma
  • ...

models/likelihoods (import from bqme.models):

  • NormalQM(mu:distribution, sigma:distribution)
  • GammaQM(alpha:distribution, beta:distribution)
  • LognormalQM(mu:distribution, sigma:distribution)
  • WeibullQM(alpha:distribution, sigma:distribution)
  • InvGammaQM
  • ...

Inputs to the models need to be distributions.

Todos

  • make package available on PyPI
  • tag/release on github
  • github actions for testing on different os and versions
  • use sphinx as documentation tool
  • implement fit.ppf(q), fit.cdf(x), fit.pdf(x), ...
  • add Mixture-model