Research project estimating the causal impact of further education on employment and wellbeing outcomes.
This code depends on the use having access to the Household, Income and Labour Dynamics in Australia (HILDA) dataset. Please see the website for access conditions and procedures.
- install conda (https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html)
conda env create -f environment.yml
Data_preparation.py
requires the HILDA filespart1/Combined {...}190c.sav
to be placed in the data directory.
- run
conda activate reed
- within the re-education directory run
jupyter notebook
There are many cases where we want to run models and display results over different datasets. Notebooks are good at keeping results, plots and the code used to generate them together, but we want to avoid having multiple near-identical notebooks for different inputs. To solve this issue, notebooks can be parameterised (using Papermill).
- Put all the paramters in a single code cell. Tag the cell with
parameters
. - Optionally put any exposition markdown that should be displayed in a header page, rather than within every result page, in a cell tagged
overview
. - Add the notebook and corresponding list of parameter settings to
execute_notebooks.py
The code for this is all contained in execute_notebooks
. The rough process is;
- Write a configuration file containing a Dict from Notebook to a list of parameter settings for that notebook.
- For each base notebook, parameter setting pair, Papermill generates and executes a notebook with those parameter settings.
- Custom code modifies some markdown components of the generated notebook to make them display better in Jupyterbook, generates a
_toc.yml
file and markdown files for chapter headings to group a set of parameterised results. jupyter-book
translates the generated notebooks into html.
The configuration file required by execute_notebooks
is just a python file that exposes 3 global variables.
RESULT_DIRECTORY
(the directory results should be saved in)FORCE_EXECUTION
(should existing results be overwritten)NOTEBOOKS
(a Dict from Notebook to a list of parameter settings for that notebook)
Below is a minimal example configuration that will execute the Data-Processing notebook in test mode using the general release.
from reed import RunConfig, Notebook
RESULT_DIRECTORY = 'test_results'
FORCE_EXECUTION = False # If False notebooks that already exist in results will not be re-executed
# A map from a notebook to a list of configurations to run that notebook with {Notebook -> List[RunConfig]}
NOTEBOOKS = {
Notebook("Data-Processing.ipynb", "Data Processing"): [
RunConfig('general', {'test': True, 'release': 'general'})
],
}
To run models and generate html, run python execute_notebooks.py {your_config_here.py}