/EMpht.jl

[Julia Package] Fitting Phase-Type Distributions using an EM Algorithm

Primary LanguageJuliaOtherNOASSERTION

EMpht.jl

A Julia port of the EMpht.c program, used for fitting Phase-Type distributions via an EM algorithm.

The original C — which is available on Søren Asmussen's website — is well documented and has a decent performance for phase-type distributions with a small or moderate number of phases. However it is quite slow for when the number of phases is large (>= 20), and the UX is very old-school Unix.

This port is much simpler and faster. See the examples given below and the test cases.

Examples

To fit a phase-type distribution to some data:

using Distributions
using EMpht

data = rand(Exponential(1/10), 1_000)  # Generate some data to fit 
sample = EMpht.Sample(obs=data)        # Create an EMpht Sample object with this data
ph = empht(sample, p=5)                # Fit the data using p=5 phases

xGrid = range(0, 8, length=1_00)       # Create a grid to evaluate the density function over
fitPDFs = pdf.(ph, xGrid)              # The probability density function of the fitted phase-type

The default structure of the phase-type is "Coxian" (see below for details). For large values of p the "CanonicalForm1" is recommended. To impose no structure on the phase-type, use "General", though the results degrade quickly with p > 5 or so. Another available option is "GeneralisedCoxian".

phGen = empht(sample, p=20, ph_structure="General")
phCox = empht(sample, p=20, ph_structure="Coxian")
phCF1 = empht(sample, p=50, ph_structure="CanonicalForm1")

If the data is not fully observed, i.e. the data is binned (interval-censored), then the Sample object is updated like so:

# The intervals
int = [1.5 2.0; 2.0 2.5; 2.5 3.0; 3.0 3.5; 3.5 4.0; 4.0 4.5; 4.5 5.0; 5.0 5.5;
        5.5 6.0; 6.0 6.5; 6.5 7.0; 7.0 7.5]
# The number of observations falling into each interval
intweight = [4.0, 34.0, 107.0, 170.0, 202.0, 222.0, 140.0, 77.0, 24.0, 14.0,
        4.0, 2.0]
 # Create an EMpht Sample object with this data
sInt = EMpht.Sample(int=int, intweight=intweight)

# Fitting the interval-censored data
phCF1 = empht(sInt, p=100, ph_structure="CanonicalForm1")
xGrid = range(0, 8, length=1_000)
fitPDFs = pdf.(phCF1, xGrid)

To choose the algorithm used to fit the data (see papers below for details):

phunif = empht(sample, p=5, method=:unif)  # Fit using the uniformization technique (default)
phode = empht(sample, p=5, method=:ode)    # Fit using the more traditional ODE solving technique

EMpht.jl can read all necessary information from a JSON file (the number of phases to fit, the special structure of the phase-type, the sample to fit). For example, if you download the Coxian100.json file inside the test directory, the following will launch a fit based on those parameters:

ph100 = empht("Coxian100.json")

Resources

The relevant papers for the algorithms are:

  • S. Asmussen, O. Nerman & M. Olsson, Fitting phase-type distribution via the EM algorithm, Scandinavian Journal of Statistics 23, 419-441 (1996),
  • M. Olsson, Estimation of phase-type distributions from censored data, Scandinavian Journal of Statistics 23, 443-460 (1996).
  • H. Okamura, T. Dohi, K.S. Trivedi, A refined EM algorithm for PH distributions, Performance Evaluation 68, 938-954 (2011)
  • H. Okamura, T. Dohi, K.S. Trivedi, Improvement of expectation-maximization algorithm for phase-type distributions with grouped and truncated data, Appl. Stochastic Models Bus. Ind. 29, 141-156 (2013)

Some case studies using this package are:

  • S. Asmussen, P.J. Laub, H. Yang, Phase-type models in life insurance: fitting and valuation of equity-linked benefits, Risks 7(1), 17 pages (2019)
  • A. Vuorinen, The blockchain propagation process: a machine learning and matrix analytic approach, University of Melbourne Masters Thesis (2019), see website or thesis.