BQME is a package that allows users to fit a distribution to observed quantile data. The package uses Order Statistics as the noise model, which is more robust than e.g. Gaussian noise model (mean squared error). The paper describing the theory can be found on arxiv: https://arxiv.org/abs/2008.06423. The notebooks for the experiments in the paper are moved to https://github.com/RSNirwan/BQME_experiments.
BQME generates stan-code that implements the matching and then uses stan's sampling
and optimizing
functions for posterior samples and MAP estimate, respectively.
Install latest release via pip
pip install bqme
For latest development version clone the repository and install via pip
git clone https://github.com/RSNirwan/bqme
cd bqme
pip install .
Install with dev dependencies
git clone https://github.com/RSNirwan/bqme
cd bqme
pip install -e .[dev]
# pip install -e ".[dev]" # for ZSH users
After installing dev dependencies we can run tests
# only run fast tests
python -m pytest --cov=bqme tests/ --cov-report term-missing
# run all tests (also the one marked by slow) - roughly 10 min needed
python -m pytest --cov=bqme tests/ --slow --cov-report term-missing
Here, we fit a Normal distribution to observed quantile data using order statistics of the observed quantiles. Note that the likelihood is not a Normal distribution, but the order statistics of the observed quantiles assuming the underlying distribution is a Normal.
from bqme.distributions import Normal, Gamma
from bqme.models import NormalQM
N, q, X = 100, [0.25, 0.5, 0.75], [-0.1, 0.3, 0.8]
# define priors
mu = Normal(0, 1, name='mu')
sigma = Gamma(1, 1, name='sigma')
# define likelihood
model = NormalQM(mu, sigma)
# sample the posterior
fit = model.sampling(N, q, X)
# extract posterior samples
mu_posterior = fit.mu
sigma_posterior = fit.sigma
# get stan sample object
stan_samples = fit.stan_obj
# get pdf and cdf of x_new
x_new = 1.0
pdf_x = fit.pdf(x_new)
cdf_x = fit.cdf(x_new)
# get percent point function of q_new (inverse of cdf)
# default return values are samples from posterior predictive p(x|q)
q_new = 0.2
ppf_q = fit.ppf(q_new)
We can also look at the generated stan code and optimize the parameters (MAP) instead of sampling the posterior.
from bqme.distributions import Normal, Gamma
from bqme.models import NormalQM
mu = Normal(0, 1, name='mu')
sigma = Gamma(1, 1, name='sigma')
model = NormalQM(mu, sigma)
# print generated stan code
print(model.code)
# optimize
N, q, X = 100, [0.25, 0.5, 0.75], [-0.1, 0.3, 0.8]
fit = model.optimizing(N, q, X)
# extract optimized parameters
mu_opt = fit.mu
sigma_opt = fit.sigma
# get pdf, cdf, ppf
pdf_x = fit.pdf(1.1)
cdf_x = fit.cdf(1.1)
ppf_q = fit.ppf(0.2)
distributions/priors (import from bqme.distributions
):
-
Normal(mu:float, sigma:float, name:str)
-
Gamma(alpha:float, beta:float, name:str)
-
Lognormal(mu:float, sigma:float, name:str)
-
Weibull(alpha:float, sigma:float, name:str)
-
InvGamma
-
...
models/likelihoods (import from bqme.models
):
-
NormalQM(mu:distribution, sigma:distribution)
-
GammaQM(alpha:distribution, beta:distribution)
-
LognormalQM(mu:distribution, sigma:distribution)
-
WeibullQM(alpha:distribution, sigma:distribution)
-
InvGammaQM
-
...
Inputs to the models need to be distributions.
- make package available on PyPI
- tag/release on github
- github actions for testing on different os and versions
- use sphinx as documentation tool
- implement fit.ppf(q), fit.cdf(x), fit.pdf(x), ...
- add Mixture-model