Extensive Deep Temporal Point Process

This is an official source code for implementation on Extensive Deep Temporal Point Process, which is composed of the following three parts:

1. REVIEW on methods on deep temporal point process

2. PROPOSITION of a framework on Granger causality discovery

3. FAIR empirical study

Reviews

We first conclude the recent research topics on deep temporal point process as four parts:

· Encoding of history sequence

· Relational discovery of events

· Formulation of conditional intensity function

· Learning approaches for optimization

By dismantling representative methods into the four parts, we list their contributions on temporal point process.

Methods with the same learning approaches:

Methods	History Encoder	Intensity Function	Relational Discovery	Learning Approaches	Released codes
RMTPP	RNN	Gompertz	/	MLE with SGD	https://github.com/musically-ut/tf_rmtpp
ERTPP	LSTM	Gaussian	/	MLE with SGD	https://github.com/xiaoshuai09/Recurrent-Point-Process
CTLSTM	CTLSTM	Exp-decay + softplus	/	MLE with SGD	https://github.com/HMEIatJHU/neurawkes
FNNPP	LSTM	FNNIntegral	/	MLE with SGD	https://github.com/omitakahiro/NeuralNetworkPointProcess
LogNormMix	LSTM	Log-norm Mixture	/	MLE with SGD	https://github.com/shchur/ifl-tpp
SAHP	Transformer	Exp-decay + softplus	Attention Matrix	MLE with SGD	https://github.com/QiangAIResearcher/sahp_repo
THP	Transformer	Linear + softplus	Structure learning	MLE with SGD	https://github.com/SimiaoZuo/Transformer-Hawkes-Process
DGNPP	Transformer	Exp-decay + softplus	Bilevel Structure learning	MLE with SGD	No available codes until now.

Methods focusing on learning approaches:

Reinforcement learning:
- U. Upadhyay, A. De, and M. Gomez Rodriguez, "Deep reinforcement learning of marked temporal point processes"
- S. Li, S. Xiao, S. Zhu, N. Du, Y. Xie, and L. Song, "Learning temporal point processes via reinforcement learning"
Adversarial and discrimitive learning:
- J. Yan, X. Liu, L. Shi, C. Li, and H. Zha, "Improving maximum likelihood estimation of temporal pointprocess via discriminative and adversarial learning"
Noise contrastive learning:
- R. Guo, J. Li, and H. Liu, "Initiator: Noise-contrastiveestimation for marked temporal point process"

Expansions:

Granger causality framework

The workflows of the proposed granger causality framework:

Experiments shows improvements in fitting and predictive ability in type-wise intensity modeling settings. And the Granger causality graph can be obtained:

Learned Granger causality graph on Stack Overflow

Fair empirical study

The results is showed in the Section 6.3. Here we give an instruction on implementation.

Installation

Requiring packages:

pytorch=1.8.0=py3.8_cuda11.1_cudnn8.0.5_0
torchvision=0.9.0=py38_cu111
torch-scatter==2.0.8

Dataset

We provide the MOOC and Stack Overflow datasets in ./data/

And Retweet dataset can be downloaded from Google Drive. Download it and copy it into ./data/retweet/

To preprocess the data, run the following commands

python /scripts/generate_mooc_data.py
python /scripts/generate_stackoverflow_data.py
python /scripts/generate_retweet_data.py

Training

You can train the model with the following commands:

python main.py --config_path ./experiments/mooc/config.yaml
python main.py --config_path ./experiments/stackoverflow/config.yaml
python main.py --config_path ./experiments/retweet/config.yaml

The .yaml files consist following kwargs:

log_level: INFO

data:
  batch_size: The batch size for training
  dataset_dir: The processed dataset directory
  val_batch_size: The batch size for validation and test
  event_type_num: Number of the event types in the dataset. {'MOOC': 97, "Stack OverFlow": 22, "Retweet": 3}

model:
  encoder_type: Used history encoder, chosen in [FNet, RNN, LSTM, GRU, Attention]
  intensity_type: Used intensity function, chosen in [LogNormMix, GomptMix, LogCauMix, ExpDecayMix, WeibMix, GaussianMix] and 
        [LogNormMixSingle, GomptMixSingle, LogCauMixSingle, ExpDecayMixSingle, WeibMixSingle, GaussianMixSingle, FNNIntegralSingle],
        where *Single means modeling the overall intensities
  time_embed_type: Time embedding, chosen in [Linear, Trigono]
  embed_dim: Embeded dimension
  lag_step: Predefined lag step, which is only used when intra_encoding is true
  atten_heads: Attention heads, only used in Attention encoder, must be a divisor of embed_dim.
  layer_num: The layers number in the encoder and history encoder
  dropout: Dropout ratio, must be in 0.0-1.0
  gumbel_tau: Initial temperature in Gumbel-max
  l1_lambda: Weight to control the sparsity of Granger causality graph
  use_prior_graph: Only be true when the ganger graph is given, chosen in [true, false]
  intra_encoding: Whether to use intra-type encoding,  chosen in [true, false]

train:
  epochs: Training epoches
  lr: Initial learning rate
  log_dir: Diretory for logger
  lr_decay_ratio: The decay ratio of learning rate
  max_grad_norm: Max gradient norm
  min_learning_rate: Min learning rate
  optimizer: The optimizer to use, chosen in [adam]
  patience: Epoch for early stopping 
  steps: Epoch numbers for learning rate decay. 
  test_every_n_epochs: 10
  experiment_name: 'stackoverflow'
  delayed_grad_epoch: 10
  relation_inference: Whether to use graph discovery, chosen in [true, false],
        if false, but intra_encoding is true, the graph will be complete.
  
gpu: The GPU number to use for training

seed: Random Seed

shinsonwu/EDTPP