We have developed the CytoTalk algorithm for de novo construction of a signaling network between two cell types using single-cell transcriptomics data. This signaling network is the union of multiple signaling pathways originating at ligand-receptor pairs. Our algorithm constructs an integrated network of intracellular and intercellular functional gene interactions. A prize-collecting Steiner tree (PCST) algorithm is used to extract the signaling network, based on node prize (cell-specific gene activity) and edge cost (functional interaction between two genes). The objective of the PCSF problem is to find an optimal subnetwork in the integrated network that includes genes with high levels of cell-type-specific expression and close connection to highly active ligand-receptor pairs.
Signal transduction is the primary mechanism for cell-cell communication and scRNA-seq technology holds great promise for studying this communication at high levels of resolution. Signaling pathways are highly dynamic and cross-talk among them is prevalent. Due to these two features, simply examining expression levels of ligand and receptor genes cannot reliably capture the overall activities of signaling pathways and the interactions among them.
CytoTalk requires a Python module to operate correctly. To install the
pcst_fast
module, please
run this command before using CytoTalk:
pip install git+https://github.com/fraenkel-lab/pcst_fast.git
CytoTalk outputs a SIF file for use in Cytoscape. Please install
Cytoscape to view the whole output
network. Additionally, you’ll have to install Graphviz and add the dot
executable to your PATH. See the Cytoscape downloads
page for more information.
If you have devtools
installed, you can use the install_github
function directly on this repository:
devtools::install_github("tanlabcode/CytoTalk")
Let’s assume we have a folder called “scRNAseq-data”, filled with single-cell RNA sequencing datasets. Here’s an example directory structure:
── scRNAseq-data
├─ scRNAseq_BasalCells.csv
├─ scRNAseq_BCells.csv
├─ scRNAseq_EndothelialCells.csv
├─ scRNAseq_Fibroblasts.csv
├─ scRNAseq_LuminalEpithelialCells.csv
├─ scRNAseq_Macrophages.csv
└─ scRNAseq_TCells.csv
⚠ IMPORTANT ⚠
Notice all of these files have the prefix “scRNAseq_” and the extension “.csv”; CytoTalk looks for files matching this pattern, so be sure to replicate it with your filenames. Let’s try reading in the folder:
dir_in <- "~/Tan-Lab/scRNAseq-data"
lst_scrna <- CytoTalk::read_matrix_folder(dir_in)
table(lst_scrna$cell_types)
BasalCells BCells EndothelialCells
392 743 251
Fibroblasts LuminalEpithelialCells Macrophages
700 459 186
TCells
1750
The outputted names are all the cell types we can choose to run CytoTalk against. Alternatively, we can use CellPhoneDB-style input, where one file is our data matrix, and another file maps cell types to columns (i.e. metadata):
── scRNAseq-data-cpdb
├─ sample_counts.txt
└─ sample_meta.txt
There is no specific pattern required for this type of input, as both filepaths are required for the function:
fpath_mat <- "~/Tan-Lab/scRNAseq-data-cpdb/sample_counts.txt"
fpath_meta <- "~/Tan-Lab/scRNAseq-data-cpdb/sample_meta.txt"
lst_scrna <- CytoTalk::read_matrix_with_meta(fpath_mat, fpath_meta)
table(lst_scrna$cell_types)
Myeloid NKcells_0 NKcells_1 Tcells
1 5 3 1
If you have a SingleCellExperiment
object with logcounts
and
colnames
loaded onto it, you can create an input list like so:
lst_scrna <- CytoTalk::from_single_cell_experiment(sce)
Finally, you can compose your own input list quite easily, simply have a matrix of either count or transformed data and a vector detailing the cell types of each column:
mat <- matrix(rpois(90, 5), ncol = 3)
cell_types <- c("TypeA", "TypeB", "TypeA")
lst_scrna <- CytoTalk:::new_named_list(mat, cell_types)
table(lst_scrna$cell_types)
TypeA TypeB
2 1
Without further ado, let’s run CytoTalk!
# read in data folder
dir_in <- "~/Tan-Lab/scRNAseq-data"
lst_scrna <- CytoTalk::read_matrix_folder(dir_in)
# set required parameters
type_a <- "Fibroblasts"
type_b <- "LuminalEpithelialCells"
# run CytoTalk process
results <- CytoTalk::run_cytotalk(lst_scrna, type_a, type_b)
[1 / 8] (11:15:28) Preprocessing...
[2 / 8] (11:16:13) Mutual information matrix...
[3 / 8] (11:20:19) Indirect edge-filtered network...
[4 / 8] (11:20:37) Integrate network...
[5 / 8] (11:21:44) PCSF...
[6 / 8] (11:21:56) Determine best signaling network...
[7 / 8] (11:21:58) Generate network output...
[8 / 8] (11:21:59) Analyze pathways...
All we need for a default run is the named list and selected cell types
(“Macrophages” and “LuminalEpithelialCells”). The most important
optional parameters to look at are cutoff_a
, cutoff_b
, and
beta_max
; details on these can be found in the help page for the
run_cytotalk
function (see ?run_cytotalk
). As the process runs, we
see messages print to the console for each sub process.
Here is what the structure of the output list looks like (abbreviated):
str(results)
List of 5
$ params
$ pem
$ integrated_net
..$ nodes
..$ edges
$ pcst
..$ occurances
..$ ks_test_pval
..$ final_network
$ pathways
..$ raw
..$ graphs
..$ df_pval
In the order of increasing effort, let’s take a look at some of the
results. Let’s begin with the results$pathways
item. This list item
contains DiagrammeR
graphs, which are viewable in RStudio, or can be
exported if the dir_out
parameter is specified during execution. Here
is an example pathway neighborhood:
Note that the exported SVG files (see dir_out
parameter) are
interactive, with hyperlinks to GeneCards and WikiPI. Green edges are
directed from ligand to receptor. Additionally, if we specify an output
directory, we can see a “cytoscape” sub-folder, which includes a SIF
file read to import and two tables that can be attached to the network
and used for styling. Here’s an example of a styled Cytoscape network:
There are a number of details we can glean from these graphs, such as node prize (side of each node), edge cost (inverse edge width), Preferential Expression Measure (intensity of each color), cell type (based on color, and shape in the Cytoscape output), and interaction type (dashed lines for crosstalk, solid for intracellular).
If we want to be more formal with the pathway analysis, we can look at
some scores for each neighborhood in the results$pathways$raw
item.
This list provides extracted subnetworks, based on the final network
from the PCST. Additionally, the results$pathways$df_pval
item
contains a summary of the neighborhood size for each pathway, along with
theoretical (Gamma distribution) test values that are found by
contrsting the found pathway to random pathways from the integrated
network. p-values for node prize, edge cost, and potential are
calculated separately.
2021-11-30: The latest release “CytoTalk_v0.99.0” resets the versioning numbers in anticipation for submission to Bioconductor. This newest version packages functions in a modular fashion, offering more flexible input, usage, and output of the CytoTalk subroutines.
2021-10-07: The release “CytoTalk_v4.0.0” is a completely re-written R version of the program. Approximately half of the run time as been shaved off, the program is now cross-compatible with Windows and *NIX systems, the file space usage is down to roughly a tenth of what it was, and graphical outputs have been made easier to import or now produce portable SVG files with embedded hyperlinks.
2021-06-08: The release “CytoTalk_v3.1.0” is a major updated R version on the basis of v3.0.3. We have added a function to generate Cytoscape files for visualization of each ligand-receptor-associated pathway extracted from the predicted signaling network between the two given cell types. For each predicted ligand-receptor pair, its associated pathway is defined as the user-specified order of the neighborhood of the ligand and receptor in the two cell types.
2021-05-31: The release “CytoTalk_v3.0.3” is a revised R version on the basis of v3.0.2. A bug has been fixed in this version to avoid errors occurred in some special cases. We also provided a new example “RunCytoTalk_Example_StepByStep.R” to run the CytoTalk algorithm in a step-by-step fashion. Please download “CytoTalk_package_v3.0.3.zip” from the Releases page (https://github.com/huBioinfo/CytoTalk/releases/tag/v3.0.3) and refer to the user manual inside the package.
2021-05-19: The release “CytoTalk_v3.0.2” is a revised R version on the basis of v3.0.1. A bug has been fixed in this version to avoid running errors in some extreme cases. Final prediction results will be the same as v3.0.1. Please download the package from the Releases page (https://github.com/huBioinfo/CytoTalk/releases/tag/v3.0.2) and refer to the user manual inside the package.
2021-05-12: The release “CytoTalk_v3.0.1” is an R version, which is more easily and friendly to use!! Please download the package from the Releases page (https://github.com/huBioinfo/CytoTalk/releases/tag/v3.0.1) and refer to the user manual inside the package.
-
Hu Y, Peng T, Gao L, Tan K. CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data. Science Advances, 2021, 7(16): eabf1356.
-
Hu Y, Peng T, Gao L, Tan K. CytoTalk: De novo construction of signal transduction networks using single-cell RNA-Seq data. bioRxiv, 2020.
- Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 2003, 13: 2498-2504.
Kai Tan, tank1@chop.edu