Motif-Informed Network Inference based on single-cell EXpression data
The pipeline is built using Nextflow DSL2 and has the purpose of infer cell-type specific gene regulatory network using scRNA-Seq data in plants.
MINI-EX uses a GNU GENERAL PUBLIC LICENSE version 3 within a dual license to offer the distribution of the software under a proprietary model as well as an open source model.
Reference: "MINI-EX: Integrative inference of single-cell gene regulatory networks in plants", Ferrari et al. 2022, Molecular Plant
- Run expression-based GRN inference (GRNBoost2) given a list of TFs and a gene-to-cell count matrix
- Run TFBS enrichment on the expression-based regulons
- Filter the TFBS-enriched regulons for TF or TF-Family motifs (default TF-Family)
- Filter the previously identified regulons by target genes' expression among the defined cell clusters (cell cluster enrichment)
- Filter the cell cluster specific regulons by TF expression
- Calculate network centrality measures (out-degree, betweenness, closeness)
- Calculate functional enrichment of the target genes of each regulon (if a list of expected GO terms is provided)
- Generate a list of ranked regulons based on Borda ranking
If a list of expected GO terms is provided:
- First all the combinations of weighted metrics (4 network centrality measures, q-value from the cell cluster enrichment, q-value from the functional enrichment) are evaluated
- The combination which returns half of the expected regulons earlier in the ranks (R50) is chosen for the weighted Borda ranking
else:
- The 4 network centrality measures and q-value from the cell cluster enrichment are used to calculate the Borda ranking (caluclated on the geometric mean of the single metrics)
- Gene-to-cell count matrix (genes as rows and cells as columns)
- List of TFs
- Seurat output from FindAllMarkers
- Tab-separated file containing the cluster identity of each cell (cell_barcode \t cluster_id)
- Tab-separated file containing the cluster annotation (cluster_id \t cluster_annotation)
- (Optional) List of GO terms of interest
As the pipeline can be run in parallel for multiple datasets all the inputs can be provided as a path to the dedicated directories.
All input files should have specific extensions and names as shown in here.
- regulons_output folder containing a tab-separated file with the inferred regulons and an excel file with the ranked regulons and relative metadata
- figures folder containing a clustermap reporting the distribution of the different regulons across the cell clusters, and two heatmaps showing the cell cluster specificity and DE calls of the top 150 regulons, respectively.
- GOenrichment_output folder containing a tab-separated file with GO enrichment for the different regulons with relative statistics
- GRNBoost2_output folder containing a TF-TG tab-separated file resulted from the GRNBoost2 run
A detailed overview on necessary input files and expected output files can be found here.
Requirements:
How to run it:
- Define paths in the config file to all the required imputs
nextflow -C miniex.config run miniex.nf