salesforce/causalai

Can the PC algorithm or any algorithm like granger causality or Varlingam algorithm for causal discovery support multi dimensional time series data(3D data)?

Closed this issue · 8 comments

Can the PC algorithm or any algorithm like granger causality or Varlingam algorithm for causal discovery support multi dimensional time series data(3D data)?

yes, all these algorithms support causal discovery for multi-dimensional time series data. Please see the tutorials in https://github.com/salesforce/causalai/tree/main/tutorials.

What should we pass as var_names for 3 dimensional data while creating object of TimeSeriesData?

you can assign any variable names you like. It should be a list of strings, of length 3 i your case since you have 3 variables. Alternatively, you can pass var_names=None, in which case the names will be enumerated as 0,1,2 (integers) by default.

Hello @devansh-arpit
By 3 dimensional data I didn’t mean data with 3 variables .What I meant was a numpy array with multiple 2 dimensional timeseries numpy arrays.Each 2d numpy array represents timeseries data for one entity.I tried with var_names = None. It is not raising error while creating object of TimeSeriesData.But it is raising error when we are calling run method of PC algorithm .

I am sharing snapshot of data for clear understanding .

Can you share any sample code where multiple timeseries data are used ?
(https://github.com/salesforce/causalai/assets/54168151/06ed9b49-6606-40c2-8394-435d2c515309)

Please take a look at the Multi-Data Object section in this tutorial. The way you can handle multiple time series data is to replace your 3D numpy array (say with M different time series) with M 2D numpy arrays. Then you pass all these arrays as input to the TimeSeriesData object as follows:

data_obj = TimeSeriesData(data_array1, data_array2, data_array3, var_names=None)

Here I have assumed that you have 3 2D numpy arrays. But you can pass as many arrays as you need to the TimeSeriesData module.

Note that the number of variables must be the same in all these arrays. The length of time series is allowed to be different for each time series.

Thanks @devansh-arpit 🙂.

what does p_value threshold signify?how much value ideally it should have?

pvalue is used by our algorithms to decide whether to deem a statistic used to measure causal strength as significant (causal edge exists) or ignore it (no causal edge). Typically, a pvalue threshold of 0.05 is common, but you can use even stricter values like 0.01 if you want to avoid false positives.

Got it