Current approaches to modeling arrival times are ad-hoc implementations of neural networks or tree-based predictors. These are not particularly built for the problem of dealing with partially observed data. In survival analysis, for example, a majority of the observations are unobserved (i.e. subjects have not died yet) but we want to estimate their age-of-death in the interest of prolonging it.
Similar to WTTE-RNN
(linked here),
we adopt a parametric approach.
However, we do not attempt to model the hazard rate,
hence we do not require strict assumptions
such as memoryless-ness,
which is a problem encountered in WTTE-RNN
.
We assume that there are only as many arrival times as there are events
and we observe conditional versions of them
for each time t
.
First we define the conditional excess random variable Z_t
.
This is the remaining time-to-event
conditioned on the true time-between-event
exceeding that which has already been observed
(i.e. the time since previous event).
Let Y_i
be the time between the i-1
-th and i
-th arrival
and N(t)
be the number of arrivals by time t
.
We are interested in Y_{N(t)+1}
,
which is the time to the next arrival.
Hence we define the conditional excess random variable as follows.
Z_t = Y_{N(t)+1} - time-since-event(t) | Y_{N(t)+1} > time-since-event(t)
Since the recurrent neural network (RNN)
outputs parameters k_t
at each time t
for the distribution of the true time to next arrival Y_{N(t)+1}
,
we can define a likelihood loss,
which is the quality of prediction at time t
.
Suppose that f(.|k), S(.|k)
are the density f(t) = lim_{t to 0} P(t-h < Y < t+h) / 2h
and survival functions S(t) = P(Y > t)
for Y
which are parametrized by k
.
Then this induces a distribution on Z_t = Y - tse(t) | Y > tse(t)
which we assume are f_t(.|k), S_t(.|k)
(i.e. S_t(s) = P(Z_t > s) = P(Y > s + tse(t) | Y > tse(t))
).
If the next time-to-event is observed before the end of training period, we have an uncensored observation.
loss_t = f*(time-to-event(t)|k_t)
Otherwise, we have a censored observation.
loss_t = S*(time-to-end(t)|k_t)
Once we have a set of optimal parameters for the RNN,
we can then obtain the optimal predicted parameters k_t
.
Only the last set of parameters is required for predicting next arrival times
(i.e. assume this is k_T
).
The predicted density of Y
given tse(T)
is f_T(.|k_T)
.
In
examples/CMAPSS/
you'll find code to run the model as described
with tensorflow
and keras
.
Only matrnn_objective.py
is strictly required to fit the model.
A demo for illustration is provided in
examples/demo.ipynb
where we predict event arrivals for 4 event types with fixed inter-arrival times.
matrnn_distributional.py
has functions for the Weibull distribution to use outside of training.matrnn_fitter.py
defines the methods for training and inference.matrnn_objective.py
describes the log-likelihoods used in model training.
For the input matrix, the loss function expects a y
matrix of shape
(n_observations, n_sequence, n_eventtypes, 4)
.
The last axis contains all the arrival-time information for the process
you want to model
4 = length((tse, tte, unc, purchstatus))
.
tse(t)
is time since previous event ifpurchstatus(t)==1
and time since start if otherwisette(t)
is time to next event ifunc(t)==1
and time to end of training if otherwiseunc(t)
is1
if next event is observed by end of training and0
otherwisepurchstatus(t)
is1
if first event has occurred and0
otherwise
You'll need to tweak the fitting and training methods for production.
The generation of the (tse, tte, unc, purchstatus)
time-series
converts a sparse table of arrivals into a dense matrix,
making this an i/o bottleneck.
A distribution for the prediction for next arrival time is useful when you have sparse and highly dependent multivariate arrival times.
You can set your estimates to minimize some exogenous loss function that is based on a random time-to-arrival by minimizing the expected loss, or output probabilities of arrival in some time period as well as a point estimate (take the mode of prediction).
A naive approach would be to bootstrap your estimates by consecutively running predicted times to next arrivals until the end of the time period but this is likely to give massive variances in your estimates. If you're expecting a few (i.e. more than one) arrivals in each time period, it might be better to consider a stochastic intensity model. I'm quite sure there are a few stochastic intensity models using RNNs but I can do it if there aren't any.
@inproceedings{Chen2018MultivariateForecasting,
title = {{Multivariate Arrival Times with Recurrent Neural Networks for Personalized Demand Forecasting}},
year = {2018},
booktitle = {Proceedings of International Conference of Data Mining DMS Workshop},
author = {Chen, Tianle and Keng, Brian and Moreno, Javier}
}