/Sharp2021b

Code for implementing parameter inference and information geometry techniques.

Primary LanguageJulia

Sharp2021b

Julia implementation of likelihood based inference and information geometry techniques, including geodesic curves and scalar curvature. This code is used to generate results by Sharp et al. in:

Sharp JA, Browning AP, Burrage K, Simpson MJ. 2022 Parameter estimation and uncertainty quantification using information geometry. Journal of the Royal Society Interface. (https://royalsocietypublishing.org/doi/full/10.1098/rsif.2021.0940).

A rough estimate of the order of the run-time of user callable functions is provided as a guide. These estimates are obtained on a standard university issued desktop computer (Dell Optiplex 7050), assuming that the required Julia packages are already precompiled.

This repository contains the following code and folders:

Contains code for reproducing the inference and information geometry results for the normal distributions, presented in Figures 1 and 2 of the Sharp et al. paper. Also includes supporting functions for producing optimisation, producing confidence regions, and generating information geometry results, that are used by all user callable functions in this repository.

The user callable scripts include:

NormalModel.jl

  • Code for reproducing the inference and information geometry results for the univariate normal distribution.

MvNormalModel.jl

  • Code for reproducing the inference and information geometry results for the multivariate normal distribution.

NormalModel_HT.jl

  • Code for reproducing the individual hypothesis test results for the univariate normal distribution.

MvNormalModel_HT.jl

  • Code for reproducing the individual hypothesis test results for the multivariate normal distribution.

NormalModel_HypothesisTests_Loop.jl

  • Code for reproducing the ensemble hypothesis test results for the univariate normal distribution.

MvNormalModel_HypothesisTests_Loop.jl

  • Code for reproducing the ensemble hypothesis test results for the multivariate normal distribution.

The runtime for these scripts is less than one minute.

The supporting functions and data are:

ConfidenceRegions.jl

  • Contains the confidenceregion2D function, for producing confidence regions in two dimensions.

EquidistantCircumferencePoints.jl

  • Contains the EquidistantCircumferencePoints function for producing initial velocities for geodesics, corresponding to equally distant points on the circumference of a unit circle.

InfoGeo2D.jl

  • Contains functions for computing information geometry quantities in two dimensions; including Christoffel symbols of the first and second kind (Christoffel2D, Christoffel2D_first), the right hand side of the geodesic ODEs (ODEproblem_geodesic2D!), functions for solving geodesic ODEs to a specified geodesic length (SingleGeodesic2D, conditionstop_shooting1, conditionstop_shooting2, conditionstop_shooting3, affectstop!), and a function for computing the scalar curvature, via the Riemann tensor and Ricci tensor (RiemannTensor).

optimise.jl

  • Contains the optimise function for performing nonlinear bound constrained optimisation using the BOBYQA algorithm

Linear and exponential growth models

Contains code for reproducing the inference and information geometry results for the linear and exponential growth models, presented in Figures 4 and 6 of the Sharp et al. paper.

The user callable scripts include:

Linear_Infer_a_C0.jl

  • Reproduces inference and information geometry results for the linear model, estimating unknown parameters a and C(0).

Exponential_Infer_a_C0.jl

  • Reproduces inference and information geometry results for the exponential model, estimating unknown parameters a and C(0).

Linear_Infer_a_StDev.jl

  • Reproduces inference and information geometry results for the linear model, estimating unknown parameters a and the standard deviation of the data.

Exponential_Infer_a_StDev.jl

  • Reproduces inference and information geometry results for the exponential model, estimating unknown parameters a and the standard deviation of the data.

Linear_Infer_a_StDev.jl

  • Reproduces inference and information geometry results for the linear model, estimating unknown parameter C(0) and the standard deviation of the data.

Exponential_Infer_a_StDev.jl

  • Reproduces inference and information geometry results for the exponential model, estimating unknown parameter C(0) and the standard deviation of the data.

The runtime of each of these scripts is on the order of one minute.

The supporting functions and data are:

DataGenerateLinear.jl

  • Generates synthetic data for the linear model.

DataGenerateExponential.jl

  • Generates synthetic data for the exponential model.

FIM_Obs_Process.jl

  • Computes the Fisher information matrix for the observation process of the linear and exponential models.

LinearFisherInformation.jl

  • Computes the Fisher information matrix for the linear model.

ExponentialFisherInformation.jl

  • Computes the Fisher information matrix for the exponential model.

Linear_Jacobian.jl

  • Computes the Jacobian of the linear model with respect to the parameters.

Exponential_Jacobian.jl

  • Computes the Jacobian of the exponential model with respect to the parameters.

Linear_partials.jl

  • Computes the partial derivatives of the linear model with respect to the parameters, required to form Jacobian matrices.

Exponential_partials.jl

  • Computes the partial derivatives of the exponential model with respect to the parameters, required to form Jacobian matrices.

LinearSolAllParams.jl

  • Computes the analytical solution of the Linear model.

ExponentialSolAllParams.jl

  • Computes the analytical solution of the exponential model.

Logistic growth model

Contains code for reproducing the inference and information geometry results for the logistic growth model, presented in Figures 3, 5, 7 and 8 of the Sharp et al. paper.

The user callable scripts include:

Logistic_Infer_r_C0.jl

  • Reproduces inference and information geometry results for the logistic model, estimating unknown parameters r and C(0).

Logistic_Infer_r_C0_highcurvature.jl

  • Reproduces inference and information geometry results for the logistic model, estimating unknown parameters r and C(0) in the high curvature region.

Logistic_Infer_r_K.jl

  • Reproduces inference and information geometry results for the logistic model, estimating unknown parameters r and K.

Logistic_Infer_r_C0_highcurvature_HT.jl

  • Reproduces hypothesis test results for the logistic model, estimating unknown parameters r and C(0) in the high curvature region.

The runtime of each of these scripts is on the order of one minute.

The supporting functions and data are:

DataGenerateLogistic.jl

  • Generates synthetic data for the logistic model.

DataGenerateLogistic_Hypothesistesting.jl

  • Generates synthetic data for the logistic model, but does not set the random seed; so that it can be called in a loop to generate synthetic datasets.

FIM_Obs_Process.jl

  • Computes the Fisher information matrix for the observation process of the logistic model.

LogisticFisherInformation.jl

  • Computes the Fisher information matrix for the logistic model.

Logistic_Jacobian.jl

  • Computes the Jacobian of the logistic model with respect to the parameters.

Logistic_partials.jl

  • Computes the partial derivatives of the logistic model with respect to the parameters, required to form Jacobian matrices.

LogisticDataplot.jl

  • Produces plots of the logistic data and model

LogisticSolAllParams.jl

  • Computes the analytical solution of the logistic model.

Logistic_data.csv

  • Contains coral area coverage data from the Australian Institute of Marine Science Long-term Monitoring Program, as presented in: MJ Simpson, AP Browning, DJ Warne, OJ Maclaren, RE Baker (2021). Parameter identifiability and model selection for sigmoid population growth models. Journal of Theoretical Biology.

SIR model

Contains code for reproducing the inference and information geometry results for the susceptible, infected, recovered (SIR) model, presented in Figures 9 and 10 of the Sharp et al. paper.

The user callable scripts include:

SIR_Infer_beta_gamma_obsIonly.jl

  • Reproduces inference and information geometry results for the SIR model where only the infected population is observed, estimating unknown parameters β and γ.

SIR_Infer_beta_gamma.jl

  • Reproduces inference and information geometry results for the SIR model where all three populations are observed, estimating unknown parameters β and γ.

SIR_Infer_beta_StDev_obsIonly.jl

  • Reproduces inference and information geometry results for the SIR model where only the infected population is observed, estimating unknown parameter β and the standard deviation of the data.

SIR_Infer_beta_StDev.jl

  • Reproduces inference and information geometry results for the SIR model where all three populations are observed, estimating unknown parameter β and the standard deviation of the data.

The runtime of each of these scripts is on the order of a few hours.

The supporting functions and data are:

DataGenerateSIR_obsIonly.jl

  • Generates synthetic data for the SIR model where only the infected population is observed.

DataGenerateSIR.jl

  • Generates synthetic data for the SIR model where all three populations are observed.

FIM_Obs_Process_SIR_obsIonly.jl

  • Computes the Fisher information matrix for the observation process of the SIR model where only the infected population is observed.

FIM_Obs_Process_SIR.jl

  • Computes the Fisher information matrix for the observation process of the SIR model where all three populations are observed.

FisherSIR_obsIonly.jl

  • Computes the Fisher information matrix for the observation process of the SIR model where only the infected population is observed.

FisherSIR.jl

  • Computes the Fisher information matrix for the SIR model, based on the observation process and the model Jacobian.

SIR_Jacobian_obsIonly.jl

  • Computes the Jacobian of the SIR model with respect to the parameters, where only the infected population is observed.

SIR_Jacobian.jl

  • Computes the Jacobian of the SIR model with respect to the parameters, where all three populations are observed.

SIR_partials.jl

  • Computes the partial derivatives of the SIR model with respect to the parameters, required to form Jacobian matrices.

SIRdataplot.jl

  • Produces plots of the SIR data and model

SIRODE.jl

  • Contains a function for the right hand side of the SIR model ODEs.

SIRSolAllParams.jl

  • Computes the approximate numerical solution of the SIR model.

MurrayData.csv

  • Contains influenza outbreak data from Murray JD. 2002 Mathematical Biology I: an Introduction, 3rd ed. Heidelberg: Springer.

Key packages

Key Julia packages used in this work include DifferentialEquations.jl (Rackauckas, 2017), Distributions.jl (Besancon, 2021), FiniteDifferences.jl, ForwardDiff.jl (Revels, 2016), LinearAlgebra.jl and NLopt.jl with the BOBYQA algorithm (Powell, 2009).

Besançon M, Papamarkou T, Anthoff D, Arslan A, Byrne S, Lin D, Pearson J. 2021 Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem. Journal of Statistical Software 98, 1--30. (doi.org/10.18637/jss.v098.i16).

Powell MJD. 2009 The BOBYQA algorithm for bound constrained optimization without derivatives. Technical report. Department of Applied Mathematics and Theoretical Physics. Cambridge, England.

Rackauckas C, Nie Q. 2017 DifferentialEquations.jl – A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia. The Journal of Open Research Software 5, 1--10. (doi.org/10.5334/jors.151).

Revels J, Lubin M, Papamarkou T. 2016 Forward-mode automatic differentiation in Julia. arXiv [Mathematical Software], 1--4. (https://arxiv.org/pdf/1607.07892.pdf).

Reproducibility

Results in the manuscript are produced using Julia v1.5.4. Data is generated using pseudo-random numbers; the stream of generated numbers may differ following a Julia version change (https://docs.julialang.org/en/v1.5/stdlib/Random/), regardless of the set random seed. For reproducibility we provide the origianally generated data from which the results are produced, however this is only necessary for identical reproduction of the results in the manuscript, and similar results can be generated without loading the original data. To run the scripts using the original data, replace the DataGenerate...() line with code that loads the appropriate data, for example in the script SIR_Infer_beta_gamma.jl:

replace

Ntime,Ndata = DataGenerateSIR([β,γ,S₀,I₀,R₀],TimePoints,NperT,σ) #generate synthetic data

with

using DelimitedFiles

Data = readdlm("SIRData.csv",',',Float64)

Ntime = vec(Data[:,1])

Ndata = Data[:,2:4]

It is also necessary to ensure that the values for TimePoints, and NperT are consistent with the loaded data. For example, when loading the data from LogisticData_earlymid.csv, ensure that TimePoints = [1000/365.25,2500/365.25], and NperT = [15,15], and when loading LogisticData_earlymidlate.csv, TimePoints = [1000/365.25,2500/365.25,4000/365.25], and NperT = [10,10,10].