A Python library for integrating model-based and judgmental forecasting
Quickstart | Docs | Examples
We'll relate three questions on the Metaculus crowd prediction platform using a generative model:
# Log into Metaculus
metaculus = ergo.Metaculus(username="ought", password="")
# Load three questions
q_infections = metaculus.get_question(3529, name="Covid-19 infections in 2020")
q_deaths = metaculus.get_question(3530, name="Covid-19 deaths in 2020")
q_ratio = metaculus.get_question(3755, name="Covid-19 ratio of fatalities to infections")
# Relate the three questions using a generative model
def deaths_from_infections():
infections = q_infections.sample_community()
ratio = q_ratio.sample_community()
deaths = infections * ratio
ergo.tag(deaths, "Covid-19 deaths in 2020")
return deaths
# Compute model predictions for the `deaths` question
samples = ergo.run(deaths_from_infections, num_samples=5000)
# Submit model predictions to Metaculus
q_deaths.submit_from_samples(samples)
You can run the model here.
- Open this Colab
- Add your Metaculus username and password
- Select "Runtime > Run all" in the menu
- Edit the code to load other questions, improve the model, etc., and rerun
The theory behind Ergo:
- Many of the pieces necessary for good forecasting work are out there:
- Prediction platforms
- Probabilistic programming languages
- Superforecasters + qualitative human judgments
- Data science tools like numpy and pandas
- Deep neural nets as expressive function approximators
- But they haven't been connected yet in a productive workflow:
- It's difficult to get data in and out of prediction platforms
- Submitting questions to these platforms takes a long time
- The questions on prediction platforms aren't connected to decisions, or even to other questions on the same platform
- Human judgments don't scale
- Models often can't take into account all relevant considerations
- Workflows aren't made explicit so they can't be automated
- This limits their potential:
- Few people build models
- Few people submit questions to prediction platforms, or predict on these platforms
- Improvements to forecasting accrue slowly
- Most decisions are not informed by systematic forecasts
- Better infrastructure for forecasting can connect the pieces and help realize the potential of scalable high-quality forecasting
Ergo is still at an early stage. Pre-alpha, or whatever the earliest possible stage is. Functionality and API are in flux.
Here's what Ergo provides right now:
- Express generative models in a probabilistic programming language
- Ergo provides lightweight wrappers around Pyro functions to make the models more readable
- Specify distributions using 90% confidence intervals, e.g.
ergo.lognormal_from_interval(10, 100)
- For Bayesian inference, Ergo provides a wrapper around Pyro's variational inference algorithm
- Get model results as Pandas dataframes
- Interact with the Metaculus and Foretold prediction platforms
- Load question data given question ids
- Use community distributions as variables in generative models
- Submit model predictions to these platforms
- For Metaculus, we automatically fit a mixture of logistic distributions for continuous-valued questions
- Plot community distributions
WIP:
- Documentation
- Clearer modeling API
Planned:
- Interfaces for all prediction platforms
- Search questions on prediction platforms
- Use distributions from any platform
- Programmatically submit questions to platforms
- Track community distribution changes
- Common model components
- Index/ensemble models that summarize fuzzy large questions like "What's going to happen with the economy next year?"
- Model components for integrating qualitative adjustments into quantitative models
- Simple probability decomposition models
- E.g. see The Model Thinker (Scott Page)
- Better tools for integrating models and platforms
- Compute model-based predictions by constraining model variables to be close to the community distributions
- Push/pull to and from repositories for generative models
- Think Forest + Github
If there's something you want Ergo to do, let us know!
This notebook is closest to a tutorial right now:
- El Paso workflow
- This notebook shows multi-level decomposition, Metaculus community distributions, ensembling, and beta-binomial and log-normal distributions using part of the El Paso Covid-19 model.
The notebooks below have been created at different points in time and use Ergo in inconsistent ways. Most are rough scratchpads of work-in-progress and haven't been cleaned up for public consumption:
-
Relating Metaculus community distributions: Infections, Deaths, and IFR
- A notebook for the model shown above that uses a model to update Metaculus community distributions towards consistency
-
Model-based predictions of Covid-19 spread
- End-to-end example:
- Load multiple questions from Metaculus
- Compute model predictions based on assumptions and external data
- Submit predictions to Metaculus
- End-to-end example:
-
Model-based predictions of Covid-19 spread using inference from observed cases
- A version of the previous notebook that infers growth rates before and after lockdown decisions
-
- Show Metaculus prediction results as a dataframe
- Filter Metaculus questions by date and status.
-
- Illustrates how to load all questions for a Metaculus category (in this case for the El Paso series)
Outdated Ergo notebooks:
-
Predicting how long lockdowns will last in multiple locations
-
Estimating the number of active Covid-19 infections in each country using multiple sources
Notebooks on the path to Ergo:
-
Fitting mixtures of logistic distributions
- How can we transform arbitrary distributions represented as samples into the "mixtures of logistics" format Metaculus uses for user submissions?
Ergo is an open source project and we love contributions!
There are many open issues, including plenty that are good for newcomers.
Read more about Ergo development in the docs.
Before you start implementation, make a new issue or comment on an existing one to let us know what you're planning to do. You can also ping us at ergo@ought.org first.