/EDTPP

Primary LanguagePython

Extensive Deep Temporal Point Process

This is an official source code for implementation on Extensive Deep Temporal Point Process, which is composed of the following three parts:

1. REVIEW on methods on deep temporal point process

2. PROPOSITION of a framework on Granger causality discovery

3. FAIR empirical study

Reviews

We first conclude the recent research topics on deep temporal point process as four parts:

· Encoding of history sequence

· Relational discovery of events

· Formulation of conditional intensity function

· Learning approaches for optimization

By dismantling representative methods into the four parts, we list their contributions on temporal point process.

Methods with the same learning approaches:

Methods History Encoder Intensity Function Relational Discovery Learning Approaches Released codes
RMTPP RNN Gompertz / MLE with SGD https://github.com/musically-ut/tf_rmtpp
ERTPP LSTM Gaussian / MLE with SGD https://github.com/xiaoshuai09/Recurrent-Point-Process
CTLSTM CTLSTM Exp-decay + softplus / MLE with SGD https://github.com/HMEIatJHU/neurawkes
FNNPP LSTM FNNIntegral / MLE with SGD https://github.com/omitakahiro/NeuralNetworkPointProcess
LogNormMix LSTM Log-norm Mixture / MLE with SGD https://github.com/shchur/ifl-tpp
SAHP Transformer Exp-decay + softplus Attention Matrix MLE with SGD https://github.com/QiangAIResearcher/sahp_repo
THP Transformer Linear + softplus Structure learning MLE with SGD https://github.com/SimiaoZuo/Transformer-Hawkes-Process
DGNPP Transformer Exp-decay + softplus Bilevel Structure learning MLE with SGD No available codes until now.

Methods focusing on learning approaches:

Expansions:

Granger causality framework

The workflows of the proposed granger causality framework:

Experiments shows improvements in fitting and predictive ability in type-wise intensity modeling settings. And the Granger causality graph can be obtained:

Learned Granger causality graph on Stack Overflow

Fair empirical study

The results is showed in the Section 6.3. Here we give an instruction on implementation.

Installation

Requiring packages:

pytorch=1.8.0=py3.8_cuda11.1_cudnn8.0.5_0
torchvision=0.9.0=py38_cu111
torch-scatter==2.0.8

Dataset

We provide the MOOC and Stack Overflow datasets in ./data/

And Retweet dataset can be downloaded from Google Drive. Download it and copy it into ./data/retweet/

To preprocess the data, run the following commands

python /scripts/generate_mooc_data.py
python /scripts/generate_stackoverflow_data.py
python /scripts/generate_retweet_data.py

Training

You can train the model with the following commands:

python main.py --config_path ./experiments/mooc/config.yaml
python main.py --config_path ./experiments/stackoverflow/config.yaml
python main.py --config_path ./experiments/retweet/config.yaml

The .yaml files consist following kwargs:

log_level: INFO

data:
  batch_size: The batch size for training
  dataset_dir: The processed dataset directory
  val_batch_size: The batch size for validation and test
  event_type_num: Number of the event types in the dataset. {'MOOC': 97, "Stack OverFlow": 22, "Retweet": 3}

model:
  encoder_type: Used history encoder, chosen in [FNet, RNN, LSTM, GRU, Attention]
  intensity_type: Used intensity function, chosen in [LogNormMix, GomptMix, LogCauMix, ExpDecayMix, WeibMix, GaussianMix] and 
        [LogNormMixSingle, GomptMixSingle, LogCauMixSingle, ExpDecayMixSingle, WeibMixSingle, GaussianMixSingle, FNNIntegralSingle],
        where *Single means modeling the overall intensities
  time_embed_type: Time embedding, chosen in [Linear, Trigono]
  embed_dim: Embeded dimension
  lag_step: Predefined lag step, which is only used when intra_encoding is true
  atten_heads: Attention heads, only used in Attention encoder, must be a divisor of embed_dim.
  layer_num: The layers number in the encoder and history encoder
  dropout: Dropout ratio, must be in 0.0-1.0
  gumbel_tau: Initial temperature in Gumbel-max
  l1_lambda: Weight to control the sparsity of Granger causality graph
  use_prior_graph: Only be true when the ganger graph is given, chosen in [true, false]
  intra_encoding: Whether to use intra-type encoding,  chosen in [true, false]

train:
  epochs: Training epoches
  lr: Initial learning rate
  log_dir: Diretory for logger
  lr_decay_ratio: The decay ratio of learning rate
  max_grad_norm: Max gradient norm
  min_learning_rate: Min learning rate
  optimizer: The optimizer to use, chosen in [adam]
  patience: Epoch for early stopping 
  steps: Epoch numbers for learning rate decay. 
  test_every_n_epochs: 10
  experiment_name: 'stackoverflow'
  delayed_grad_epoch: 10
  relation_inference: Whether to use graph discovery, chosen in [true, false],
        if false, but intra_encoding is true, the graph will be complete.
  
gpu: The GPU number to use for training

seed: Random Seed