Generative AI meets Probabilistic Programming.
ppchain
an open-source toolkit for intuitive, effective modeling.
Your copilot to build model internal representations and optimize your Bayesian workflow.
ppchain
aims to ease the pains of building a model.
Following the 3 main steps of the Bayesian data analysis process, as defined in [1], ppchain
provides a (progressively growing) toolbox of AI-assisted functions aiming to make your life easier along the way:
-
Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem.
ppchain
searches for domain knowledge about your underlying problem and helps building an internal representation that is consistent with both background knowledge and collected data. -
Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.
-
Evaluating the fit of the model and the implications of the resulting posterior distribution: how well does the model fit the data? are the substantive conclusions reasonable? and how sensitive are the results to the modeling assumptions made?
ppchain
provides a (progressively growing) set of AI-assisted functions to progress through the following workflow (where
-
Define the problem statement
- Problem statement (conversational AI)
- Specify hypothesis
- Select model type
- Data collection method
-
Formalize priors,
$P(\theta)$ - Search for background knowledge
- Prior elicitation
- Formalize prior distributions
- Prior predictive checks
-
Determine the likelihood function,
$P(y \mid \theta)$ - Search for background knowledge
- Load & preprocess data
- Formalize the likelihood function
-
Compute the posterior distribution,
$P(\theta \mid y) \propto P(y \mid \theta) \, P(\theta)$ - Variables selection, identifying the subset of predictors to include in the model
- Determine the functional form of the model
- Fit the model to the observed data to estimate the unknown model parameters
- Compute posterior distribution
-
Run posterior inference
- Compute posterior inference
- Posterior predictive checking
- Sensitivity analysis
- Make predictions about future events
- Documentation: https://ppchain.readthedocs.io
Contributions are very welcome, whether it is in the form of a new feature, improved infrastructure, or better documentation. Please use Github Flow. Create a branch, add commits, and open a pull request.
If you are interested to get further involved with the ValueGrid team, please contact us.
Usage is provided under the MIT license. See LICENSE for full details.
- Initial inspiration for
ppchain
came from Thomas Wiecki, PhD and Daniel Lee, as explained in more details in this LinkedIn post and Medium article.
[1] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Chapman & Hall/CRC