Is it possible to work on the cell*peak matrix for single cell ATAC data?

Question

Is it possible to work on the cell*peak matrix for single cell ATAC data?

Opened this issue 5 years ago · 4 comments

I wanted to try the clustering method used in your method. Is there any function in the package supporting clustering the cell*peak matrix for scATAC? (not the GUI version.) Thanks

Answer 1 · 2019-04-26T18:19:23.000Z

Hi,
there is no readily to be used R function to do clustering but I have excerpted the code in GUI to do the clustering. See below. Ideally you need to first run dimension reduction (PCA,tsne,umap...) and then run the clustering. There are many different clustering methods. 'reducedata' will be your input and 'res' is the output. Each row is a cell and each column is a reduced dimension.

reducedata <- matrix(rnorm(500),nrow=100)

kmeans

x <- 2:min(20,nrow(reducedata))
y <- sapply(x, function(k) {
set.seed(12345)
clu <- kmeans(reducedata,k,iter.max = 100)$cluster
cluSS <- sum(sapply(unique(clu),function(i) {
sum(rowSums(sweep(reducedata[clu==i,,drop=F],2,colMeans(reducedata[clu==i,,drop=F]),"-")^2))
}))
1-cluSS/sum((sweep(reducedata,2,colMeans(reducedata),"-"))^2)
})
y <- c(0,y)
x <- c(1,x)
clunum <- which.min(sapply(x, function(i) {
x2 <- pmax(0,x-i)
sum(lm(y~x+x2)$residuals^2)
}))

set.seed(12345)
res <- kmeans(reducedata,clunum,iter.max = 100)$cluster

#hclust
datahclust <- hclust(dist(reducedata))
x <- 2:min(20,nrow(reducedata))
y <- sapply(x, function(k) {
clu <- cutree(datahclust,k)
cluSS <- sum(sapply(unique(clu),function(i) {
sum(rowSums(sweep(reducedata[clu==i,,drop=F],2,colMeans(reducedata[clu==i,,drop=F]),"-")^2))
}))
1-cluSS/sum((sweep(reducedata,2,colMeans(reducedata),"-"))^2)
})
y <- c(0,y)
x <- c(1,x)
clunum <- which.min(sapply(x, function(i) {
x2 <- pmax(0,x-i)
sum(lm(y~x+x2)$residuals^2)
}))
res <- cutree(datahclust,clunum)

mclust

library(mclust)
x <- 2:min(20,nrow(reducedata))
clunum <- Mclust(reducedata,G=x,modelNames="VVV",priorControl(functionName="defaultPrior", shrinkage=0.1))$G
res <- Mclust(reducedata,G=clunum,modelNames="VVV",priorControl(functionName="defaultPrior", shrinkage=0.1))
res <- apply(res$z,1,which.max)

DBSCAN

library(dbscan)
minpts <- 20
alldist <- sort(kNNdist(reducedata,minpts))
x <- 1:length(alldist)
eps <- alldist[which.min(sapply(x, function(i) {
x2 <- pmax(0,x-i)
sum(lm(alldist~x+x2)$residuals^2)
}))]
res <- dbscan(reducedata,eps=eps,minPts=minpts)$cluster

Answer 2 · 2019-04-29T20:03:36.000Z

Got you. If I have a cellpeak matrix and I did PCA/TSNE/UMAP on it, then I use one of those clustering methods, is this the typical way for SCRAT doing clustering? or Do I need to do something else on the cellpeak matrix before I did PCA/TSNE/UMAP? Thanks.

Answer 3 · 2019-04-29T20:13:37.000Z

Yes. SCRAT by default also performs standardization for each cell across all peaks/features to have mean 0 and sd 1. The standardized data is sent to PCA/tsne/umap (Note that for PCA, standardization for each feature across all cells is still needed, so two steps of standardization) and finally do clustering.

Answer 4 · 2019-04-30T03:00:38.000Z

It's very clear now, thanks.