Inquiry about Transfer Entropy Results using MultivariateTE's analyse_single_target Method
peanutnim opened this issue · 10 comments
Dear IDXTL Research and Development Team,
I hope this email finds you well. I am writing to seek clarification on an issue I encountered while using the MultivariateTE library's "analyse_single_target" method to analyze the stock price volatility of several companies.
Initially, I imported the stock volatility data of 18 companies and ran the "analyse_single_target" method, which produced transfer entropy results for 8 of these companies with respect to the target company. Following this, I removed the data of two companies that did not have any transfer entropy to the target company and ran the analysis again. To my surprise, the transfer entropy results of the previous two companies disappeared from the output adjacency matrix. Instead, transfer entropy appeared for two different companies.
Could you please help me understand why this phenomenon occurred? Is there any explanation for the change in transfer entropy results after removing the data of the two companies without transfer entropy to the target company?
I would greatly appreciate any insights or guidance you can provide on this matter. Thank you in advance for your time and attention.
Looking forward to hearing from you.
The multivariate TE seeks to identify the minimal set of parents that can best form a model for predicting the target's dynamic updates. If those two companies were included in the model, then they should have had (statistically significant) TE to the target - perhaps not pairwise TE at the first round, but conditional TE on the other selected nodes. So when you say that they did not have any transfer entropy, I'm assuming that's reported from the pairwise TE at the first round.
When you remove them from the data set, of course they will not appear in the output adjacency matrix when you run the analysis again (I presume that's not what you're surprised about). The two new companies would be appearing because now when you build the model without the two companies you removed, inclusion of these two new companies are now able to significantly improve the (reduced) model. Presumably what they contribute is redundant with the removed companies so they were not able to be included when the former companies were available, because the former companies were a better choice and the new ones didn't add anything beyond them. (Or perhaps there was not enough statistical power to include all of them in the model). Hope that makes sense.
I'm going to close this issue since it's not a bug, you can post to the google group if you need further explanation (or reopen if there is really a bug that I'm missing here)
The multivariate TE seeks to identify the minimal set of parents that can best form a model for predicting the target's dynamic updates. If those two companies were included in the model, then they should have had (statistically significant) TE to the target - perhaps not pairwise TE at the first round, but conditional TE on the other selected nodes. So when you say that they did not have any transfer entropy, I'm assuming that's reported from the pairwise TE at the first round. When you remove them from the data set, of course they will not appear in the output adjacency matrix when you run the analysis again (I presume that's not what you're surprised about). The two new companies would be appearing because now when you build the model without the two companies you removed, inclusion of these two new companies are now able to significantly improve the (reduced) model. Presumably what they contribute is redundant with the removed companies so they were not able to be included when the former companies were available, because the former companies were a better choice and the new ones didn't add anything beyond them. (Or perhaps there was not enough statistical power to include all of them in the model). Hope that makes sense. I'm going to close this issue since it's not a bug, you can post to the google group if you need further explanation (or reopen if there is really a bug that I'm missing here)
However,I removed the variables that did not have transfer entropy with the target variable, but in the second iteration of the adjacency matrix, the two companies that had transfer entropy with the target variable in the first round no longer had transfer entropy, and instead, two other companies that originally did not have transfer entropy suddenly had transfer entropy with the target company. Unfortunately, my request to join the Google group has not been approved yet.
Hi Michael,
Here is my code,
import pandas as pd
import matplotlib.pyplot as plt
from idtxl.multivariate_te import MultivariateTE
from idtxl.data import Data
from idtxl.visualise_graph import plot_network
df = pd.read_csv('/Users/yana/Desktop/idtxl1/2.csv')
df['Date'] = pd.to_datetime(df['date'])
df = df.set_index('Date')
columns_to_calculate = df.columns[1:]
data_matrix = df[columns_to_calculate].to_numpy()
data = Data(data_matrix, dim_order='sp')
network_analysis = MultivariateTE()
settings = {'cmi_estimator': 'JidtKraskovCMI',
'max_lag_sources': 5,
'min_lag_sources': 1}
results = network_analysis.analyse_single_target(settings=settings,data=data,target=0)
print(results.get_single_target(0,fdr=False))
results.print_edge_list(weights='max_te_lag', fdr=False)
plot_network(results=results, weights='max_te_lag', fdr=False)
plt.show()
I judged the transfer entropy between the source companies and the target company by looking at the directed graph and adjacency matrix. I think there's no problem with this. The same thing happened when I added more stock volatility sequences of companies to the dataset. After running the code, some of the transfer entropy from the original companies to the target company disappeared, and some did not. At the same time, some newly added companies or those that did not show transfer entropy to the target company initially reappeared with transfer entropy. I'm very confused about this. What do you think?
Hi Michael,
Thanks for your kind help. I'm sorry to bother you again.
I used the same code twice, only changing the data without altering the code, and encountered an error when setting fdr=True for the graphs. here is the error:
Traceback (most recent call last):
File "/Users/yana/IDTxl/idtxl/results.py", line 463, in get_single_target
return self._single_target_fdr[target]
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/yana/Desktop/idtxl1/zailaiyibian.py", line 28, in
print(results.get_single_target(0,fdr=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yana/IDTxl/idtxl/results.py", line 469, in get_single_target
raise RuntimeError(
RuntimeError: No FDR-corrected results for target 0. Set fdr=False to see uncorrected results.
I think the error may be related to the fdr setting, but I am unsure how to properly set it for the analyse_single_target method of MultivariateTE.
Thank you for your help, your assistance is greatly appreciated.
I deeply appreciate your help. I will recheck my data and indices. I make sure your amazing work would help a lot in my own research.
If I encounter similar issues in the future, I will consult with you again. Thank you for your assistance and guidance!
Hi,Michael
I'm sorry to bother you again.I have converted my data into a two-dimensional numpy array, and I am unsure how to incorporate the concept of replication in this context. Therefore, I am uncertain about the appropriate way to add replication.
And during my analysis, I encountered the following warning message and would appreciate your assistance in understanding its meaning and addressing related concerns:
WARNING: Number of replications is not sufficient to generate the desired number of surrogates. Permuting samples in time instead.
maximum statistic, n_perm: 200
Could you please provide clarification regarding the meaning and implications of this warning message? What does it indicate when the number of replications is considered insufficient? How does idtxl handle this situation by permuting samples in time?
Additionally, I have been searching for documentation or information pertaining to the replication parameter in idtxl, but have been unable to find any specific details. Could you kindly provide information on how to set and adjust the replication parameter? What is the default value, and how does it impact the analysis?
After encountering the aforementioned warning message, does idtxl automatically adjust any settings or parameters? If so, could you please explain the automatic adjustments that occur after encountering this warning?
I am grateful for your time and support in addressing these concerns. As a student utilizing idtxl, your guidance would greatly contribute to my research analysis. Thank you for your dedication in developing and maintaining the idtxl.
Best ,
Pitkin