Networks are vital tools for understanding and modeling interactions in complex systems in science and engineering, and direct and indirect interactions are pervasive in all types of networks. However, quantitatively disentangling direct and indirect relationships in networks remains a formidable task. Here, we present a framework, called iDIRECT (Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity), for quantitatively inferring direct dependencies in association networks. Using copula-based transitivity, iDIRECT eliminates/ameliorates several challenging mathematical problems, including ill-conditioning, selflooping, and interaction strength overflow.
With simulation data as benchmark examples, iDIRECT showed high prediction accuracies. Application of iDIRECT to reconstruct gene regulatory networks in Escherichia coli also revealed considerably higher prediction power than the bestperforming approaches in the DREAM5 (Dialogue on Reverse Engineering Assessment and Methods project, #5) Network Inference Challenge. In addition, applying iDIRECT to highly diverse grassland soil microbial communities in response to climate warming showed that the iDIRECT-processed networks were significantly different from the original networks, with considerably fewer nodes, links, and connectivity, but higher relative modularity. Further analysis revealed that the iDIRECT-processed network was more complex under warming than the control and more robust to both random and target species removal (P < 0.001).
As a general approach, iDIRECT has great advantages for network inference, and it should be widely applicable to infer direct relationships in association networks across diverse disciplines in science and engineering.
The full manuscript is available at https://doi.org/10.1073/pnas.2109995119. Please email naijia.xiao@ou.edu for questions regarding the program.
The main functions for iDIRECT is included in the script idirect.pyc
, with
auxiliary functions in the scripts file_handler.pyc
and net_handler.pyc
. A
demonstration script demo.py
is included as a minimal example.
The modules can be imported by
import idirect as idir
import file_handler as fh
import net_handler as nh
For debugging purpose, the program starting time is recorded
from time import time
t0 = time()
The observed total association network is read from the file result/demo.txt
as a list of weighted edges
G,n = fh.read_file_weighted_edges("result/demo.txt", t0)
The content of the file result/demo.txt
is shown below
N1 N2 0.7
N2 N3 0.7
N1 N3 0.5
The observed total association network G is saved as a dictionary of dictionaries
>>> G
{'N1': {'N2': 0.7, 'N3': 0.5}, 'N2': {'N1': 0.7, 'N3': 0.7}, 'N3': {'N2': 0.7, 'N1': 0.5}}
Then iDIRECT is called to calculate the direct association network
S,err = idir.direct_association(G, t0=t0)
The direct association network S is also saved as a dictionary of dictionaries
>>> S
{'N1': {'N2': 0.6963451220087373, 'N3': 0.055394034573921835}, 'N2': {'N1': 0.6963451220087373, 'N3': 0.6963451220087373}, 'N3': {'N2': 0.6963451220087373, 'N1': 0.055394034573921835}}
The direct association network is merged with the total association network and the edges are extracted as a list of tuples containing the source, target, and weight of each edge
S2 = nh.merge(G, S)
St = fh.save_sorted_turple(S2, in_file="result/demo.txt")
The purpose is to retain the same number of edges as the original observed total network as required by the DREAM5 network inference challenge. For this demo example, it has no effect
>>> S2
{'N1': {'N2': 0.6963451220087373, 'N3': 0.055394034573921835}, 'N2': {'N1': 0.6963451220087373, 'N3': 0.6963451220087373}, 'N3': {'N2': 0.6963451220087373, 'N1': 0.055394034573921835}}
>>> St
[('N1', 'N2', 0.6963451220087373), ('N2', 'N3', 0.6963451220087373), ('N1', 'N3', 0.055394034573921835)]
Finally, the direct association network is saved in the file
result/demo_res.txt
fh.save_file_weighted_edges(St, "result/demo_res.txt")
The content of the output file result/demo_res.txt
is shown below
N1 N2 0.696345122
N2 N3 0.696345122
N1 N3 0.055394035
Scripts used to generate the results in the manuscript are located in the
script
folder. Main results used by the scripts are located in the result
folder. Due to the size limitation, other intermediate results are saved in a
zipped data folder iDIRECT_out
downloadable at
http://ieg4.rccc.ou.edu/publication/iDIRECT/iDIRECT_out.zip. To use it, please download and
decompress it to the same parent folder as the iDIRECT
project folder. The
overall folder layout should be
├─ iDIRECT
| ├─ result
| ├─ script
| └─ ...
└─ iDIRECT_out
Simulated networks for assessing the performance of various network inference methods, including iDIRECT, Network Deconvolution (ND), Global Silencing (GS), and Partial Correlation (PC).
The relevant script files are located in the iDIRECT/script
folder. Their
locations and functions are summarized below.
File | Note |
---|---|
complex_simulated_data.py |
Network simulation, iDIRECT and PC (partial correlation) solutions generation |
complex_simulated_ndgs.m |
ND (network deconvolution) and GS (global silencing) solutions generation |
complex_simulated_curve.py |
PR (precision-recall) curves calculation |
complex_simulated_aupr.py |
AUPR (area under precision-recall curves) calculation |
Some of the result files are saved in the result/complex_simulated
folder. All
the other result files are saved in the downloadable data folder iDIRECT_out
.
Their locations and functions are summarized below.
Path | Note |
---|---|
complex_simulated/raw |
Original networks containing indirect relationships |
complex_simulated/true |
Corresponding true links |
complex_simulated/idirect(nd,gs,pc) |
iDIRECT (ND/GS/PC) solutions |
complex_simulated/curves |
PR and ROC curves |
complex_simulated/PR_ROC_stats.txt |
AUPR and AUROC results |
The procedure follows that of the Figure 2. Modifications on scripts
script/complex_simulated_data.py
, script/complex_simulated_curve.py
, and
script/complex_simulated_aupr.py
are made to allow multiple network sizes. The
plot is generated using script/complex_simulated_aupr_plot.py
.
Reconstruction of an in silico gene regulatory network and the genome-scale transcriptional regulatory network in E. coli from the DREAM5 Network Inference Challenge (https://www.synapse.org/#!Synapse:syn2820440/wiki/). The corresponding gene expression data for the in silico network was simulated using GeneNetWeaver (GNW) version 3.0 (http://gnw.sourceforge.net/). The E. coli gene regulatory network was reconstructed from chip-based gene expression data, respectively.
The challenge organizers provided the scripts used to evaluate the performance of submitted networks.
Folder | Note |
---|---|
script/matlab |
MATLAB scripts used to evaluate submitted networks |
script/INPUT |
Input folder containing gold standards and predicted edges |
script/OUTPUT |
Output folder storing network scores |
The scripts for ND and GS are also saved in the script
folder.
File/Folder | Note |
---|---|
script/ND.m |
Main function from ND for symmetric matrices |
script/ND_regulatory.txt |
Main function from ND for asymmetric matrices |
script/SILENCING.m |
Main function from GS |
script/ND_author |
Scripts provided by ND authors |
script/GS_author |
Scripts provided by GS authors including a wrapper script to run GS |
The scripts used to generated iDIRECT, ND, and GS solutions are located in the
script
folder. The locations and functions of them are summarized below.
File | Note |
---|---|
script/dream5_challenge_direct.py |
iDIRECT solution for each submission |
script/dream5_challenge_ndgs.m |
ND and GS solutions for each submission |
script/dream5_challenge_score.m |
Network score evaluation |
script/dream5_challenge_figtab.py |
Summary of the results |
The network scores and the precision-recall curves are saved in the
result/dream5_challenge
folder.
File | Note |
---|---|
result/dream5_challenge/score_all_sub.txt |
Network scores for different methods and submissions |
result/dream5_challenge/score_sub_part.txt |
Summary of the key network scores |
result/dream5_challenge/PR_curve_Other2.txt |
Precision-Recall curves for the best performer in E. coli network |
The reordered edges are saved in the dream5_challenge
folders from the
downloadable data folder iDIRECT_out
.
Folder | Note |
---|---|
iDIRECT_out/dream5_challenge/raw |
Original edge list for each submission |
iDIRECT_out/dream5_challenge/trans |
Transformed edges for each submission |
iDIRECT_out/dream5_challenge/idirect(nd,gs_author) |
iDIRECT(or ND, or GS)-processed submissions |
A subset of submissions are included to create partial community networks. These networks are scored using the scripts provided by DREAM5 challenge organizers. The locations and functions of relevant scripts are summarized below.
File | Note |
---|---|
script/dream5_challenge_assem.py |
Community networks from random combination |
script/dream5_challenge_assem_score.m |
Network score evaluation |
script/dream5_challenge_assem_plot.m |
Display the results |
The randomized assembly networks are saved in folders from the downloadable data
folder iDIRECT_out
. A summary of all network scores is saved in
result/dream5_challenge/score_all_sub.txt
.
Folder | Note |
---|---|
iDIRECT_out/dream5_challenge/raw/Assembly |
Community networks from original submissions |
iDIRECT_out/dream5_challenge/idirect(nd,gs_author)/Assembly |
Community networks from iDIRECT(or ND, or GS)-processed submissions |
Random networks are generated using random weight exchanges. Then they are scored using the scripts provided by DREAM5 challenge organizers. The locations and functions of relevant scripts are summarized below.
File | Note |
---|---|
script/dream5_challenge_rand.py |
Random networks from random weight exchanges |
script/dream5_challenge_rand_score.m |
Random network score evaluation |
script/dream5_challenge_rand_comp.m |
Evaluation of the significance of the differences |
The randomized networks are saved in folders from the downloadable data folder
iDIRECT_out
. A summary of all network scores is saved in
result/dream5_challenge/score_rand.txt
.
Folder | Note |
---|---|
iDIRECT_out/dream5_challenge/rand/raw |
Random networks from original submissions |
iDIRECT_out/dream5_challenge/rand/idirect |
Random networks from iDIRECT-processed submissions |
iDIRECT_out/dream5_challenge/rand/nd |
Random networks from ND-processed submissions |
iDIRECT_out/dream5_challenge/rand/gs_author |
Random networks from GS-processed submissions |
Molecular Ecological Networks (MENs) of soil microbial communities in response to in situ experimental warming, as discussed in subsection Application to microbial community networks in response to warming of the Results section in the main text.
The raw OTU tables and CLR-transformed (Centred Log-Ratio) OTU tables can be
found in the result/seasonal_warming
folder. Their functions are summarized
below. The MENs are constructed using the Molecular Ecological Network Analysis
Pipeline at http://ieg4.rccc.ou.edu/MENA/, which is publicly accessible.
File | Note |
---|---|
16S_OK5Y_warming.txt |
OTU table under warming treatment |
16S_OK5Y_control.txt |
OTU table under control treatment |
clr_16S_OK5Y_warming.txt |
CLR-transformed OTU table under warming treatment |
clr_16S_OK5Y_control.txt |
CLR-transformed OTU table under control treatment |
The resulting node and edge list files are saved in the
result/seasonal_warming
folder. Files with names containing '_new' contains
information that are used for GePhi visualization (https://gephi.org/).
File | Note |
---|---|
16S_OK5Y_warming 0.71 node(edge)_attribute.txt |
Nodal (or edge) attributes for the original network under warming treatment |
16S_OK5Y_warming 0.61 node(edge)_attribute.txt |
Nodal (or edge) attributes for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control 0.71 node(edge)_attribute.txt |
Nodal (or edge) attributes for the original network under control treatment |
16S_OK5Y_control 0.61 node(edge)_attribute.txt |
Nodal (or edge) attributes for the iDIRECT-processed network under control treatment |
The proportions of simulated species extinction triggered by random species
removal and targeted species removal are collected in the
result/seasonal_warming/knockout_vulner/simuresult
folder.
File | Note |
---|---|
simuresult_random_deletion.csv |
Simulated species extinction triggered by random species removal |
simuresult_target_deletion.csv |
Simulated species extinction triggered by targeted species removal |
The nodal properties can be found in the node list files in the
result/seasonal_warming
folder. The relevant column is "node.degree".
File | Note |
---|---|
16S_OK5Y_warming 0.71 node_attribute_new.txt |
Nodal attributes for the original network under warming treatment |
16S_OK5Y_warming 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control 0.71 node_attribute_new.txt |
Nodal attributes for the original network under control treatment |
16S_OK5Y_control 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under control treatment |
The plot was generated with MENAP (http://ieg4.rccc.ou.edu/MENA/), and the
accompanying files containing module separation results can be found in the
result/seasonal_warming
folder.
Folder | Note |
---|---|
16S_OK5Y_warming ME_results 0.71 |
Module separation for the original network under warming treatment |
16S_OK5Y_warming ME_results 0.61 |
Module separation for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control ME_results 0.71 |
Module separation for the original network under control treatment |
16S_OK5Y_control ME_results 0.61 |
Module separation for the iDIRECT-processed network under control treatment |
The nodal properties can be found in the node list files in the
result/seasonal_warming
folder. The relevant columns are "Zi", "Pi", and "Role".
File | Note |
---|---|
16S_OK5Y_warming 0.71 node_attribute_new.txt |
Nodal attributes for the original network under warming treatment |
16S_OK5Y_warming 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control 0.71 node_attribute_new.txt |
Nodal attributes for the original network under control treatment |
16S_OK5Y_control 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under control treatment |
Fig. S9. Comparison of correlations between keystone OTUs abundance and soil, plant and ecosystem functioning variables
The raw OTU tables, CLR-transformed (Central Log-Ratio) OTU tables, the soil,
plant and ecosystem functioning variables, and the keystone OTUs can be found in
the result/seasonal_warming
folder. The script used to perform the calculation
is script/seasonal_env_keystone.py
.
File | Note |
---|---|
16S_OK5Y_warming.txt |
OTU table under warming treatment |
16S_OK5Y_control.txt |
OTU table under control treatment |
clr_16S_OK5Y_warming.txt |
CLR-transformed OTU table under warming treatment |
clr_16S_OK5Y_control.txt |
CLR-transformed OTU table under control treatment |
OK5Y_warming_EnvFac.txt |
Soil, plant, and ecosystem functioning variables under warming treatment |
OK5Y_control_EnvFac.txt |
Soil, plant, and ecosystem functioning variables under control treatment |
Keystone_OTUs.txt |
Keystone OTUs |
The OTU significance results and the nodal properties can be found in the
result/seasonal_warming
folder. The script used to perform the calculation is
script/seasonal_env_keystone.py
.
File | Note |
---|---|
16S_OK5Y_warming GS_1565125339.txt |
Gene significance results for networks under warming treatment |
16S_OK5Y_control GS_1565124793.txt |
Gene significance results for networks under control treatment |
16S_OK5Y_warming 0.71 node_attribute_new.txt |
Nodal attributes for the original network under warming treatment |
16S_OK5Y_warming 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control 0.71 node_attribute_new.txt |
Nodal attributes for the original network under control treatment |
16S_OK5Y_control 0.61 node_attribute_new.txt |
Nodal attributes for the iDIRECT-processed network under control treatment |
The script to run the robustness analysis is script/seasonal_net_knockout.R
.
The input files can be found in the result
folder and downloadable data folder
iDIRECT_out
Folder | Note |
---|---|
result/seasonal_warming/knockout_vulner/ |
Simulated species extinction triggered by random species removal |
iDIRECT_out/seasonal_warming/knockout_vulner/ |
Simulated species extinction triggered by targeted species removal |
The proportions of simulated species extinction triggered by random species
removal and targeted species removal are collected in the
result/seasonal_warming/knockout_vulner/simuresult
folder.
File | Note |
---|---|
simuresult_random_deletion.csv |
Simulated species extinction triggered by random species removal |
simuresult_target_deletion.csv |
Simulated species extinction triggered by targeted species removal |
The results are saved in the result/seasonal_warming
folder. The global
properties are saved in files 16S_OK5Y_*** *** global.prop.rep
.
File/Folder | Note |
---|---|
16S_OK5Y_warming 0.71 global.prop.rep |
Topological properties for the original network under warming treatment |
16S_OK5Y_warming 0.61 global.prop.rep |
Topological properties for the iDIRECT-processed network under warming treatment |
16S_OK5Y_control 0.71 global.prop.rep |
Topological properties for the original network under control treatment |
16S_OK5Y_control 0.61 global.prop.rep |
Topological properties for the iDIRECT-processed network under control treatment |
The corresponding global properties of 100 random networks are saved as tab-
separated text files summary.out
in the folders
result/seasonal_warming/16S_OK5Y_*** random_network ***
.
File/Folder | Note |
---|---|
16S_OK5Y_warming random_network 0.71 |
Randomized original network under warming treatment |
16S_OK5Y_warming random_network 0.61 |
Randomized iDIRECT-processed network under warming treatment |
16S_OK5Y_control random_network 0.71 |
Randomized original network under control treatment |
16S_OK5Y_control random_network 0.61 |
Randomized iDIRECT-processed network under control treatment |
The module separation was run with MENAP (http://ieg4.rccc.ou.edu/MENA/). The
results can be found in result/seasonal_warming/16S_OK5Y_*** ME_results ***
.
Folder | Note |
---|---|
16S_OK5Y_warming ME_results 0.71 |
Original network under warming treatment |
16S_OK5Y_warming ME_results 0.61 |
iDIRECT-processed network under warming treatment |
16S_OK5Y_control ME_results 0.71 |
Original network under control treatment |
16S_OK5Y_control ME_results 0.61 |
iDIRECT-processed network under control treatment |
The conditioning number of the total association matrix increases significantly as the size of the network increases and the number of samples is fixed, as discussed in Appendex A.2.
The eigenvalues of the corresponding matrices are stored in JSON as a text file
in result/example_condit/ex_math_conditioning.txt
. The full matrices with
different size and association measures can be found as separate files in
the example_condit
folder in the downloadable data folder iDIRECT_out
.