/scRNAseq-BladderCancer

Collaboration with David Mulholland at Mount Sinai for invasive Bladder Cancer

Primary LanguageRGNU General Public License v3.0GPL-3.0

scRNAseq-BladderCancer

Collaboration with Bishoy Morris Faltas (bmf9003@med.cornell.edu) at Weill Cornell and David Mulholland (david.mulholland@mssm.edu) at Mount Sinai for Invasive Bladder Cancer project. Publised in nature communication 2020 Epithelial plasticity can generate multi-lineage phenotypes in human and murine bladder cancers.

R Shinyapp

Human bladder cance scRNA-seq

Mouse bladder cance scRNA-seq

METHOD

Single-cell RNA-seq data is pre-processed with the scater R package. Data normalization, unsupervised cell clustering, and differential expression analysis were carried out by the Seurat R package. Reference-based cell type annotation was carried out using the SingleR R package.

How to use this Script

Key Software Setup

R version 3.6.0
Seurat_3.0.3
MAST_1.10.0
scater_1.12.0
scran_1.12.1
SingleR_1.0.1

After pulling this repository, create folders data and output in the top working folder. Move Cell Ranger analysis results into data folder. Tree structure of directory:

1. scater.R

scater.R for human
scater.R for mouse
Initial quality control and remove low quality cells.

After running these two scripts, sce_list_Human_{date}.Rda and sce_list_Mouse_{date}.Rda files will be generated inside

2. Seurat_setup.R

Seurat_setup.R for human
Seurat_setup.R for mouse

Cells with less than 800 genes or 1500 UMIs or more than 15% of mitochondria genes were excluded from the analysis. Gene expression raw counts were normalized following a global-scaling normalization method with a scale factor of 10,000 and natural log transformation, using the Seurat NormalizeData function. The top 2000 highly variable genes were selected using the expression and dispersion (variance/mean) of genes, followed by canonical correlation analysis (CCA) to identify common sources of variation between the patient and normal datasets. The first 20 CCA results were chosen to generate dimensional t-Distributed Stochastic Neighbor Embedding (tSNE) plots, and cell clustering by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.

Need to modify the code according to the date. After running these two scripts, BladderCancer_H2_{date}.Rda and BladderCancer_H2_{date}.Rda files will be generated inside data folder. Do not modify any files in data folder.

3. SingleR.R

SingleR.R for Human
SingleR.R for Mouse
Cell types were identified by SingleR (Single-cell Recognition) package. SingleR is a novel computational method for unbiased cell type recognition of scRNA-seq. SingleR leverages reference transcriptomic datasets of pure cell types to infer the cell of origin of each of the single cells independently.

After running this script, singler_BladderCancer_H2_{date}.RData and singler_BladderCancer_M2_{date}.RData file will be generated inside output folder.

4. Identify_Cell_Types_Manually.R

Identify_Cell_Types_Manually.R for Human
Identify_Cell_Types_Manually.R for Mouse
All clusters are tested against marker genes and gene sets.

Multiple plots and table will be generated, save them when necessary.

5. Differential_analysis.R

Differential_analysis.R for Human
Differential_analysis.R for Mouse
Modified FindAllMarkers() FindAllMarkers.UMI() will generate similar dataframe plus two extra columns UMI.1 and UMI.2 to record nUMI. UMI.1 is average nUMI of current cluster, UMI.2 is average nUMI of all rest of clusters.
FindAllMarkers(object, test.use = "MAST") : MAST (Model-based Analysis of Single Cell Transcriptomics), a GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015)

Below is an example of a Differential analysis output file.

gene p_val avg_logFC pct.1 pct.2 p_val_adj UMI.1 UMI.2 cluster
Psca 0 3.9340 0.939 0.055 0 3.5565 0.0339 0
Ppbp 0 2.9163 0.99 0.161 0 3.0622 0.1834 0
Ltf 0 2.9105 0.959 0.042 0 2.6070 0.0365 0
Ecm1 0 2.7729 0.965 0.072 0 2.6652 0.0931 0
Gsto1 0 2.7221 0.995 0.035 0 2.7625 0.0496 0

The results data frame has the following columns :

gene: gene name.
p_val: p_val is calculated using MAST (Model-based Analysis of Single Cell Transcriptomics, Finak et al., Genome Biology, 2015)
avg_logFC: log fold-change of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group.
pct.1: The percentage of cells where the gene is detected in the first group.
pct.2: The percentage of cells where the gene is detected in the second group.
p_val_adj: Adjusted p-value, based on Bonferroni correction
UMI.1 is average nUMI of the current cluster.
UMI.2 is average nUMI of rest of clusters.
cluster: either cell type or corresponding cluster.