IRIS-FGM has been accepted by Bioconductor.
---Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis
Single-cell RNA-Seq data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of functional gene modules (FGM) can help to understand gene interactive networks and complex biological processes. QUBIC2 is recognized as one of the most efficient and effective tools for FGM identification from scRNA-Seq data. However, its availability is limited to a C implementation, and its applicative power is affected by only a few downstream analyses functionalities. We developed an R package named IRIS-FGM (integrative scRNA-Seq interpretation system for functional gene module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can identify co-expressed and co-regulated FGMs, predict types/clusters, identify differentially expressed genes, and perform functional enrichment analysis. It is noteworthy that IRIS-FGM also applies Seurat objects that can be easily used in the Seurat vignettes.
- System environment:
We recommend users to install IRIS-FGM on large memory (32GB) based UNIX/Linux operation system if the user aims at analyzing bicluster-based co-expression analysis. If the user seeks to analyze data by quick mode, we recommend installing IRIS-FGM on a small memory (8GB) based Windows or Linux operation system; IRIS-FGM does not support MAC.
- R environment: R (equal or greater than 3.5)
- Install required packages from CRAN:
install.packages(c('BiocManager','devtools', 'AdaptGauss', "pheatmap", 'mixtools','MCL', 'anocva', 'qgraph','Rtools','ggpubr',"ggraph","Seurat","Polychrome"))
- Install required packages from Bioconductor:
BiocManager::install(c('org.Mm.eg.db','multtest','org.Hs.eg.db','clusterProfiler','DEsingle', 'DrImpute','scater', 'scran'))
- Install IRISFGM from github:
devtools::install_github("BMEngineeR/IRISFGM",ref="main")
Now the MetaQUBIC is successfully installed and ready for use.
IRIS-FGM can accepted 10X chromium input files, including a folder (contain gene name, cell name, and sparse matrix) and .h5 file.
- set working directory and import library
# dir.create("your working directory",recursive = TRUE)
# setwd("your working directory")
library(IRISFGM)
- Read from .h5 file.
ReadFrom10X_h5("~/5k_pbmc_protein_v3_filtered_feature_bc_matrix.h5")
- Read from 10X folder, which should contain three files (barcode, gene name, and sparse matrix)
ReadFrom10X_folder("~/hg19/")
- Read from .csv or .txt file
First, we should download data from the link, then we will use this data set as example to run the pipeline.
InputMatrix <- read.table(url("https://bmbl.bmi.osumc.edu/downloadFiles/Yan_expression.txt"),
header = TRUE,
row.names = 1,
check.names = FALSE)
- For the computational efficiency, we will use subsampling data, and create IRIS-FGM object.
set.seed(123)
seed_idx <- sample(1:nrow(InputMatrix),3000)
InputMatrix_sub <- InputMatrix[seed_idx,]
object <- CreateIRISFGMObject(InputMatrix_sub)
- Addmeta: this step can add customized cell label by user, the format of file passing to
meta.info
is data frame of which row name should be cell ID, and column name should be cell type.
my_meta <- read.table(url("https://bmbl.bmi.osumc.edu/downloadFiles/Yan_cell_label.txt"),header = TRUE,row.names = 1)
object <- AddMeta(object, meta.info = my_meta)
- plotmeta: plot meta information based on RNA count and Feature number. This step is for the following subset step in terms of filtering out low quality data.
PlotMeta(object)
- remove low quality data based on the previous plot.
object <- SubsetData(object , nFeature.upper=2000,nFeature.lower=250)
User can choose perform normalization or imputation based on their need. The normalization method has two options, one is the simplist CPM normalization (default normalization = 'cpm'
). The other is from package scran and can be opened by using parameter normalization = 'scran'
, . The imputation method is from package DrImpute and can be opened by using parameter IsImputation = TRUE
(default as closed).
object <- ProcessData(object, normalization = "cpm", IsImputation = FALSE)
The argument Gene_use = 500
is top 500 highlt variant genes which are selected to run LTMG. For quick mode, we recommend to use top 2000 gene (here we use top 500 gene for saving time). On the contrary, for co-expression gene analysis, we recommend to use all gene by changing Gene_use = "all"
.
# do not show progress bar
quiet <- function(x) {
sink(tempfile())
on.exit(sink())
invisible(force(x))
}
# demo only run top 500 gene for saving time.
object <- quiet(RunLTMG(object, Gene_use = "500"))
# you can get LTMG signal matrix
LTMG_Matrix <- GetLTMGmatrix(object)
LTMG_Matrix[1:5,1:5]
ID | OoCyte_1 | OoCyte_2 | OoCyte_3 | Zygote_2 | 2_Cell_embryo_1_Cell_1 |
---|---|---|---|---|---|
MTRNR2L2 | 1 | 2 | 2 | 2 | 2 |
TPT1 | 2 | 2 | 2 | 2 | 3 |
UBB | 2 | 2 | 2 | 2 | 2 |
FABP5 | 3 | 2 | 2 | 2 | 2 |
PTTG1 | 2 | 2 | 2 | 2 | 2 |
IRIS-FGM can provide biclustering function, which is based on our in-house novel algorithm,
QUBIC2 (https://github.com/maqin2001/qubic2). Here we will show the basic biclustering
usage of IRIS-FGM using a Gene_use = all
to generate LTMG matrix.
User can type the following command to run discretization (LTMG) + biclustering directly:
object <- CalBinaryMultiSignal(object)
object <- RunBicluster(object, DiscretizationModel = "LTMG",OpenDual = FALSE,
NumBlockOutput = 100, BlockOverlap = 0.5, BlockCellMin = 25)
Quantile-discretized bicluster (recommend for bulk RNA-Seq, microarray data, or large single cell RNA-Seq data)
This will output several files, and among them you will find one named tmp_expression.txt.chars.blocks
,which contains the predicted biclusters.
Or, user may use first version discretization strategy provided by QUBIC 1.0.
object <- RunDiscretization(object)
object <- RunBicluster(object, DiscretizationModel = "Quantile",OpenDual = FALSE,
NumBlockOutput = 100, BlockOverlap = 0.5, BlockCellMin = 25)
(The default parameters in IRIS-FGM are BlockCellMin=15, BlockOverlap=0.7, Extension=0.90, NumBlockOutput=100 you may use other parameters as you like, just specify them in the argument)
User can use reduction = "umap"
or reductopm = "tsne"
to perform dimension reduction.
# demo only run top 500 gene for saving time.
object <- RunDimensionReduction(object, mat.source = "UMImatrix",reduction = "umap")
object <- RunClassification(object, k.param = 20, resolution = 0.8, algorithm = 1)
The cell cluster prediction of IRIS-FGM is based on the biclustering results. In short, it will construct a weighted graph based on the biclusters and then do clustering on the weighted graph. To facilitate the process, we will use the pre-generated object to perform cell clustering result based on biclustering results.
object <- FindClassBasedOnMC(object)
data("example_object")
example_object@MetaInfo[1:5,]
ncount_RNA | nFeature | Cluster | Seurat_r_0.8_k_20 | MC_Label | |
---|---|---|---|---|---|
OoCyte_1 | 28202.91 | 657 | 1 | 2 | 1 |
OoCyte_2 | 27029.84 | 660 | 1 | 2 | 1 |
OoCyte_3 | 26520.14 | 672 | 1 | 2 | 1 |
Zygote_1 | 24796.04 | 681 | 2 | 2 | 1 |
Zygote_2 | 23575.54 | 679 | 2 | 2 | 1 |
Check gene overlapping of FGMs. The results show first 19 FMGs have overlapping genes, and there is no overlapping gene between first 1 FMGs and the 15th FGM.
PlotNetwork(example_object,N.bicluster = c(1:20))
example_object <- RunPathway(example_object, N.bicluster =4, species = "Human",database = "GO",genes.source = "Bicluster")
DotPlotPathway(example_object,genes.source = "Bicluster")
PlotHeatmap(example_object,N.bicluster = c(1, 20),show.clusters = TRUE,show.annotation=TRUE)
PlotModuleNetwork(example_object,N.bicluster = 3,method = "spearman",
cutoff.neg = -0.5,
cutoff.pos = 0.5,
layout = "circle",
node.label = TRUE,
node.col = "black",
node.label.cex = 10)
# cell clustering results based on Seurat clustering method
PlotDimension(example_object, reduction = "umap",idents = "Seurat_r_0.8_k_20")
# cell clustering results based on MCL clustering method
PlotDimension(example_object, reduction = "umap",idents = "MC_Label")
global_marker <- FindGlobalMarkers(example_object,idents = "Seurat_r_0.8_k_20")
PlotMarkerHeatmap(Globalmarkers = global_marker,object = example_object,idents ="Seurat_r_0.8_k_20")