/PRISM

An alternative to MCMC for rapid analysis of models

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

PRISM Logo

PyPI - Latest Release PyPI - Python Versions Azure Pipelines - Build Status ReadTheDocs - Build Status CodeCov - Coverage Status Gitter - Chat Room arXiv - Paper

Model dispersion with PRISM; an alternative to MCMC for rapid analysis of models

PRISM is a pure Python 3 package that provides an alternative method to MCMC for analyzing scientific models.

Introduction

Rapid technological advancements allow for both computational resources and observational/experimental instruments to become better, faster and more precise with every passing year. This leads to an ever-increasing amount of scientific data being available and more research questions being raised. As a result, scientific models that attempt to address these questions are becoming more abundant, and are pushing the available resources to the limit as these models incorporate more complex science and more closely resemble reality.

However, as the number of available models increases, they also tend to become more distinct, making it difficult to keep track of their individual qualities. A full analysis of every model would be required in order to recognize these qualities. It is common to employ Markov chain Monte Carlo (MCMC) methods and Bayesian statistics for performing this task. However, as these methods are meant to be used for making approximations of the posterior probability distribution function, there must be a more efficient way of analyzing them.

PRISM tries to tackle this problem by using the Bayes linear approach, the emulation technique and history matching to construct an approximation ('emulator') of any given model. The use of these techniques can be seen as special cases of Bayesian statistics, where limited model evaluations are combined with advanced regression techniques, covariances and probability calculations. PRISM is designed to easily facilitate and enhance existing MCMC methods by restricting plausible regions and exploring parameter space efficiently. However, PRISM can additionally be used as a standalone alternative to MCMC for model analysis, providing insight into the behavior of complex scientific models. With PRISM, the time spent on evaluating a model is minimized, providing developers with an advanced model analysis for a fraction of the time required by more traditional methods.

Why use PRISM?

  • Written in pure Python 3, for versatility;
  • Stores results in HDF5-files, allowing for easy user-access;
  • Can be executed in serial or MPI, on any number of processes;
  • Compatible with Windows, Mac OS and Unix-based machines;
  • Accepts any type of model and comparison data;
  • Built as a plug-and-play tool: all main classes can also be used as base classes;
  • Easily linked to any model by writing a single custom ModelLink subclass;
  • Capable of reducing relevant parameter space by factors over 100,000 using only a few thousand model evaluations;
  • Can be used alone for analyzing models, or combined with MCMC for efficient model parameter estimations.

When (not) to use PRISM?

It may look very tempting to use PRISM for basically everything, but keep in mind that emulation has its limits. Below is a general (but non-exhaustive) list of scenarios where PRISM can become really valuable:

  • In almost any situation where one wishes to perform a parameter estimation using an MCMC Bayesian analysis (by using hybrid sampling). This is especially true for poorly constrained models (low number of available observational constraints);
  • Whenever one wishes to visualize the correlation behavior between different model parameters;
  • For quickly exploring the parameter space of a model without performing a full parameter estimation. This can be very useful when trying out different sets of observational data to study their constraining power;
  • For obtaining a reasonably accurate approximation of a model in very close proximity to the most optimal parameter set.

There are however also situations where one is better off using a different technique, with a general non-exhaustive list below:

  • For obtaining a reasonably accurate approximation of a model in all of parameter space. Due to the way an emulator is constructed, this could easily require millions of model evaluations and a lot of time and memory;
  • When dealing with a model that has a large number of parameters/degrees-of-freedom (>50). This however still heavily depends on the type of model that is used;
  • Whenever a very large number of observational constraints are available and one wishes to use all of them (unless one also has access to a large supercomputer). In this case, it is a better idea to use full Bayesian instead;
  • One wishes to obtain the posterior probability distribution function (PDF) of a model.

A very general and easy way to check if one should use PRISM, is to ask oneself the question: "Would I use a full Bayesian analysis for this problem, given the required time and resources?". If the answer is 'yes', then PRISM is probably a good choice, especially as it requires near-similar resources as a Bayesian analysis does (definition of parameter space; provided comparison data; and a way to evaluate the model).

Getting started

Installation

PRISM can be easily installed by either cloning the repository and installing it manually:

$ git clone https://github.com/1313e/PRISM
$ cd PRISM
$ pip install .

or by installing it directly from PyPI with:

$ pip install prism

PRISM can now be imported as a package with import prism. For using PRISM in MPI, mpi4py >= 3.0.0 is required (not installed automatically).

The PRISM package comes with two ModelLink subclasses. These ModelLink subclasses can be used to experiment with PRISM to see how it works. The online docs and the tutorials have several examples explaining the different functionalities of the package.

Running tests

If one wants to run pytests on PRISM, all requirements_dev are required. The easiest way to run the tests is by cloning the repository, installing all requirements and then running pytest on it:

$ git clone https://github.com/1313e/PRISM
$ cd PRISM
$ pip install -r requirements_dev.txt
$ pytest

If PRISM and all requirements_dev are already installed, one can run the tests by running pytest in the installation directory:

$ cd <path_to_installation_directory>/prism
$ pytest

When using Anaconda, the installation directory path is probably of the form <HOME>/anaconda3/envs/<environment_name>/lib/pythonX.X/site-packages.

Example usage

See online docs or the tutorials for a documented explanation on this example.

# Imports
from prism import Pipeline
from prism.modellink import GaussianLink

# Define model data and create ModelLink object
model_data = {3: [3.0, 0.1], 5: [5.0, 0.1], 7: [3.0, 0.1]}
modellink_obj = GaussianLink(model_data=model_data)

# Create Pipeline object
pipe = Pipeline(modellink_obj)

# Construct first iteration of the emulator
pipe.construct()

# Create projections
pipe.project()

Community guidelines

PRISM is an open-source and free-to-use software package (and it always will be), provided under the BSD-3 license.

Users are highly encouraged to make contributions to the package or request new features by opening a GitHub issue. If you would like to contribute to the package, but do not know what, then there are quite a few ToDos in the code that may give you some inspiration. As with contributions, if you find a problem or issue with PRISM, please do not hesitate to open a GitHub issue about it or post it on Gitter.

And, finally, if you use PRISM as part of your workflow in a scientific publication, please consider including an acknowledgement like "Parts of the results in this work were derived using the PRISM Python package." and citing the PRISM pipeline paper:

@ARTICLE{2019ApJS..242...22V,
    author = {{van der Velden}, E. and {Duffy}, A.~R. and {Croton}, D. and
        {Mutch}, S.~J. and {Sinha}, M.},
    title = "{Model dispersion with PRISM; an alternative to MCMC for rapid analysis of models}",
    journal = {\apjs},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Physics - Computational Physics},
    year = "2019",
    month = "Jun",
    volume = {242},
    number = {2},
    eid = {22},
    pages = {22},
    doi = {10.3847/1538-4365/ab1f7d},
    archivePrefix = {arXiv},
    eprint = {1901.08725},
    primaryClass = {astro-ph.IM},
    adsurl = {http://adsabs.harvard.edu/abs/2019ApJS..242...22V},
    adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Acknowledgements

Special thanks to Darren Croton, Alan Duffy, Michael Goldstein, Simon Mutch, Manodeep Sinha and Ian Vernon for providing many valuable suggestions and constructive feedback points. Huge thanks to James Josephides for making the PRISM logo.