Predicting Fertility Data Challenge (PreFer)

This is a template repository to prepare your submission for phase 1 of the Predicting Fertility Data Challenge (PreFer) through the Next platform. The challenge is to predict whether an individual will have a child within a three year period (2021-2023), based on survey data from previous years (2007-2020). Data come from the LISS panel. For more information, on the data challenge, please visit the website or read this paper.

ℹ️ Check out the Wiki for challenge scope, leaderboards, and frequently asked questions.

Overall workflow

Prerequisites

Make a copy of this template repository, by forking and cloning as explained here. Use your own copy of the repository to prepare your method for submission as explained here.
Make sure to allow Github Actions on your own repository: Go to the “Actions” tab and click “I understand my workflows, go ahead and enable them.”
If you have not already done so, download the training data and codebooks via the "Download Data" task on the Next platform.❗️Important: you are not allowed to share these datasets and you may not upload them to your Github repository!

ℹ️ Click here for a detailed explanation on the datasets that you have downloaded. Click here for an explanation on how to use the codebooks.

Prepare your method

To participate in the challenge you need to submit a method using this repository.

Choose your programming language: the default set-up is Python, if you would like to use R, go to settings.json and change {"dockerfile": "python.Dockerfile"} into {"dockerfile": "r.Dockerfile"}. Read here how to update files in your forked repository. ℹ️ For Python this repo assumes that your method uses the Anaconda Python distribution.
Choose the main script to work with: go to submission.py for Python or submission.R for R.
Preprocess the data: any steps to clean or preprocess the data need to be added to the clean_df function in the submission.py/submission.R script with documentation. Note: The function clean_df will also be applied to the holdout data when you submit your model. At this point, the codebooks can be useful to make sense of the data.
Train, tune, and save your model: any steps to train your model need to be added to the training.py/training.R script with documentation (e.g., code for the model, number of folds, set seed). The only function in this script is train_save_model in which you can add the steps needed to run the model. The output of this script is your saved model, e.g. model.joblib for Python or model.rds for R. Make sure that your model is saved in the same folder as submission.py/submission.R under the name model.joblib for Python or model.rds for R. You can save the model in another format as well.
Test your model on fake data: you can test your clean_df function and your model (stored in: model.joblib/model.rds) on the fake data (PreFer_fake_data.csv) with the predict_outcomes function. The predict_outcomes function in submission.py/submission.R will be run on the holdout data to generate your challenge submission result on the leaderboard. Make sure that the outputs of your model are predicted classes (i.e. 0s and 1s) rather than, for example, probabilities. If you saved the model in another format (not 'joblib' for Python or 'rds' for R), update the way of loading the model. Also, make sure to add or edit dependencies when required as described here. If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.predict_outcomes led to predictions rather than errors), you can start submitting your method.

ℹ️ Check out this website for guides, notebooks, and blogs to guide you through this process.

Submit your method

Submit your method via the "Submit Method" task on the Next platform by providing a link to the repository with your method (GitHub commit URL). Follow the instructions below:

Make sure that you describe your model in the description.md file in your GitHub repository and commit changes (i.e. save changes locally)
Push the commit (i.e. upload changed version to your online repository). ❗️Important: make sure that you only push the relevant files and make sure that you do not upload any of the datasets.
In GitHub make sure that the checks pass:

ℹ️ If the check fails go to FAQ. You might need to add dependencies as described here.

On the main page of your repository, above the file list, click commits to view a list of commits, as described here
Go to the commit that you want to submit and right click on view commit details, then click "Copy Link Address", see example below:

Add a submission on the Next platform by providing the URL to your GitHub commit (copied at step 5), this commit will serve as your submission to the challenge.

ℹ️ Leaderboards are generated at fixed time points, check out important dates for leaderboard submission deadlines. Check out the Wiki for more info on the leaderboards.

License

This project is licensed under the terms of the MIT license.

Acknowledgements

The code in this repository is developed by Eyra as part of the Rank program funded by ODISSEI and the NWO VIDI grant awarded to Gert Stulp. The LISS panel data is provided by Centerdata.