This project contains the code for a model to estimate trends in chlamydia incidence in Australia. The code implements a Bayesian statistical method based on a decision-pathway model producing estimates for chlamydia incidence in Australia since 2001 for male and female 16–29-year-olds.
The original code was developed for the paper:
- Hammad Ali, Ewan Cameron, Christopher C Drovandi, James M McCaw, Rebecca J Guy, Melanie Middleton, Carol El-Hayek, et al. “A New Approach to Estimating Trends in Chlamydia Incidence.” Sexually Transmitted Infections, 2015;91:513–519. doi:10.1136/sextrans-2014-051631
For the paper, estimates were generated for the 2001-2013 period. The original code has been modified slightly to produce annual estimates since 2013 for the HIV, viral hepatitis and sexually transmissible infections in Australia Annual Surveillance Report produced annually by the The Kirby Institute. The model is written in the R language (currently version 3.3.2) with the input files to produce estimates for the 2001-2017 period.
Original Model Developer: Ewan Cameron
Email: dr.ewan.cameron@gmail.com
Affiliations: School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia; Spatial Ecology & Epidemiology Group, University of Oxford, Oxford, United Kingdom
Repository Maintainer: Richard T. Gray
ORCID ID: orcid.org/0000-0002-2885-0483
Affiliation: The Kirby Institute, UNSW Sydney, Sydney NSW 2052, Australia
Directly measuring disease incidence in a population is difficult and not feasible to do routinely. The aim of this project was to develop and apply a new method for estimating at a population level the number of incident genital chlamydia infections, and the corresponding incidence rates, by age and sex using routine surveillance data.
All the model scripts, input data and outputs are stored in the main directory and 3 sub-directories. The main directory also contains an optional Rstudio project file chlamydia_model.Rproj
(this is not required for running the model). The user is required to generate the input files in the appropriate format and specify the output files for processing.
Main directory files
The following three R scripts run the model using the the .csv
input files from the data/
directory and generate the chlamydia incidence results and figures which are stored in the output/
directory. Instructions for running the code are provided in the next section.
-
abc.run.abc.R: Contains the main algorithm calling other routines in
code/
to read in the data files indata/
and runs the ABC fitting routine. Results are stored in theoutput/
directory in a folder with a name corresponding to the date and time the script was sourced. Output files are generated for each simulation with nametheta.test.N.dat
(where N is the simulation number). SMC-ABC control parameters are specified on lines 29-33. -
abc.posterior.processing.R: Generates the posterior estimates of chlamydia notifications, test counts, incidence counts and proportions, positivity, and prevalence. The user must specify the date and time directory produced by
abc.run.abc.R
and the simulation number for the final output file. The output is saved as a fileposterior.dat
in the same user specified directory. -
abc.plot.results.R: Produces the final results and PDF plots corresponding to the results in
posterior.dat
in a user specified directory inoutput/
. All results are stored in theoutput/figures/
directory. The incidence estimates for each age and sex are stored in a separateChlamydia_Incidence.csv
file. WARNING: sourcing this file will overwrite previously generated output files.
Contains all the specific R functions used in the analysis and by the main directory scripts.
Contains all the input data files (collated from publicly available data) used by the abc.run.abc.R
script. These files are stored as .csv
files containing data since 2001 to the year of analysis (currently 2020). Four input files are required for the model to run:
priors.csv
: Specifications for prior distributions describing constant parameters for all sub-populations.population.csv
: Estimated population size for males, females and overall for each age from 0 to 100 years for each year since 2001.notifications.csv
: Annual notifications for males and females aged < 15, 15-19, 20-24, 25-29, 30-33, and > 34 for each year since 2001.tests.csv
: Annual number of chlamydia tests conducted for males and females aged < 15, 15-19, 20-24, 25-29, 30-33, and > 34 for each year since 2001. NOTE: data for 2006-2007 is missing for Australia (this is specifically accounted for in the scripts).
Contains output directories storing the results produced by the main directory scripts and associated figures. Each run/source of abc.run.abc.R
produces a folder with the date and time of running. These output directories need to be managed by the user.
Results and figures for the Australian Chlamydia Cascade paper are stored in a separate file and sub-directory within the output/figures/
directory.
Running the model and its input/output requires some management by the user. Requiring some familiarity with the R language and the use of CSV files. The code is not so sophisticated that it can take care of everything automatically and the user should perform sanity checks along the way to test whether the contents of any given file have actually been read in or not.
To run the model perform the following steps:
- Update the input files in the
data/
directory to contain required data. These files must have the same cloumn format but the number of rows can vary (this number must be consistent across the files though). - Source
abc.run.abc.R
. This runs the SMC-ABC algorithm. The fits will eventually stop by themselves once the code judges any further improvements to be outweighed by the time required in computing, but the job may also be killed manually from outside if required. Nothing is likely to be gained by running the fitting code for longer than 12 hours. Simulation output will be stored in a created directory inoutput/
named with the date and time of running. The output files have the formattheta.test.N.dat
(where N is the simulation number). - Edit lines 6-7 in
abc.run.abc.R
to have the correct output date and time and simulation number (final saved run ofabc.run.abc.R
) and then source. This produces the posterior estimates for chlamydia notifications, test counts, incidence counts and proportions, positivity, and prevalence and produces aposteriors.dat
output file in associated output directory. This script should run until it stops (probably taking an hour). - Source
abc.plot.results.R
to generate the plots and incidence results.
Please contact Richard Gray (Rgray@kirby.unsw.edu.au) or Ewan Cameron (dr.ewan.cameron@gmail.com) if you have trouble running or deciphering the code.
The following publications are associated with this project and used the code in this repository to generate the results.
- The Kirby Institute. HIV, viral hepatitis and sexually transmissible infections in Australia: Annual Surveillance Report. The Kirby Institute, UNSW Australia, Sydney NSW 2052.
- For the years 2015-2019. Available from the Kirby Institute website.
- The code was used to produce annual incidence estimates for the Australian Chlamydia cascade.
-RT Gray, D Callander, JS Hocking, S McGregor, H McManus, A Dyda, C Moreira, et al. Population-Level Diagnosis and Care Cascade for Chlamydia in Australia. Sexually Transmitted Infections 2020, 96:131–36. https://doi.org/10.1136/sextrans-2018-053801. - The code was used to produce annual incidence estimates reported in the publication.
- Published results produced using version v1.1; doi:10.5281/zenodo.2631989