Overlapping labels in PCAplot

Question

Overlapping labels in PCAplot

yifangt opened this issue 6 years ago · 3 comments

Not an issue, but a favour needed.
How to avoid the overlapping labels in my PCAplot while I'm trying some other tools like ggrepel, which seems not as easy as other examples as PCAplot is embedded within the package?
Thanks a lot!
Yifang

Answer 1 · 2018-06-07T05:08:10.000Z

Hi,
thanks for the suggestion. I'll think about that but it seems difficult without using a specific package and I'd like to avoid adding another dependency. Feel free to let me know if you have any idea!
Hugo

Answer 2 · 2018-06-08T23:57:12.000Z

Thanks Hugo!
I played with my data with your PCAPlot() function. Seems some improvement to get what I want, but I found the percentage of the PCA components changed as compared what I got from SARTools output.
Yours: PC1 - 46.55%, PC2 - 17.93%, PC3 - 8,27%
Mine: PC1 - 88.18%, PC2 - 5.00%, PC3 - 4.46%
Struggled for a while, but unable to resolve the reason. Here is my code:

library(SARTools)
library(DESeq2)
library(ggplot2)			#
library(ggrepel)
library(gridExtra)			# For grid.arrange()

cairoSizeWrapper <- function(pixel_re){
  if (options("bitmapType") == "cairo"){
    return(min(pixel_re,32767))
  } else{
    return(pixel_re)
  }
}

PCAPlot <- function(counts.trans, group, n=min(500, nrow(counts.trans)), 
                    col=c("lightblue","orange","MediumVioletRed","SpringGreen"),
                    outfile=TRUE){
  
  rv = apply(counts.trans, 1, var, na.rm=TRUE)
  pca = prcomp(t(counts.trans[order(rv, decreasing = TRUE), ][1:n,]))
  prp <- pca$sdev^2 * 100 / sum(pca$sdev^2)
  prp <- round(prp[1:3],2)

# create figure
  if (outfile) png(filename="figures/PCA.png",width=cairoSizeWrapper(1800*2),height=cairoSizeWrapper(1800),res=300)
    par(mfrow=c(1,2), cex=0.6)
	#axes 1 et 2
	abs=range(pca$x[,1]); abs=abs(abs[2]-abs[1])/25;
    ord=range(pca$x[,2]); ord=abs(ord[2]-ord[1])/25;
    plot(pca$x[,1], pca$x[,2],
         las = 1, cex = 2, pch = 16, col = col[as.integer(group)],
	     xlab = paste0("PC1 (",prp[1],"%)"), 
	     ylab = paste0("PC2 (",prp[2],"%)"), 
	     main = "Principal Component Analysis - Axes 1 and 2")
    abline(h=0,v=0,lty=2, col="lightgray")
    text(pca$x[,1] - ifelse(pca$x[,1]>0,abs,-abs), pca$x[,2] - ifelse(pca$x[,2]>0,ord,-ord),
         colnames(counts.trans), col=col[as.integer(group)])

	# axes 1 et 3
	abs=range(pca$x[,1]); abs=abs(abs[2]-abs[1])/25;
    ord=range(pca$x[,3]); ord=abs(ord[2]-ord[1])/25;
    plot(pca$x[,1], pca$x[,3],
         las = 1, cex = 2, pch = 16, col = col[as.integer(group)],
	     xlab = paste0("PC1 (",prp[1],"%)"), 
	     ylab = paste0("PC3 (",prp[3],"%)"), 
	     main = "Principal Component Analysis - Axes 1 and 3")
    abline(h=0,v=0,lty=2,col="lightgray")
    text(pca$x[,1] - ifelse(pca$x[,1]>0,abs,-abs), pca$x[,3] - ifelse(pca$x[,3]>0,ord,-ord),
         colnames(counts.trans), col=col[as.integer(group)])
  if (outfile) dev.off()

  return(invisible(pca$x))
}

colors <- c("dodgerblue","firebrick1", "MediumVioletRed","SpringGreen")               # vector of colors of each biological condition on the plots
varInt <- "Treatment"    
typeTrans <- "VST" 
condRef <- "Ctr"
locfunc <- "median" 
batch <- NULL    
featuresToRemove <- c("alignment_not_unique",        		
                      "ambiguous", "no_feature",     		
                      "not_aligned", "too_low_aQual")		
#Loading target file
targetFile <- "/storage/ppl/yifang/20180227_RNAseq/scripts/target_all.txt" 
target <- loadTargetFile(targetFile=targetFile, varInt=varInt, condRef=condRef, batch=batch)

# loading counts
# This will create dataframe for vst.
counts <- loadCountData(target=target, rawDir=rawDir, featuresToRemove=featuresToRemove)
counts.trans <- vst(counts)

PCAPlot(counts.trans, group=target[,varInt], n = min(500, nrow(counts.trans)), col=c("Blue","Red", "Orange","Green"), outfile=TRUE)

vst was used for normalization, but not sure anything is broken when only PCAPlot() was used.
I appreciate if you could give me some clue to resolve the inconsistency.

Answer 3 · 2018-06-09T13:32:55.000Z

Hi,
the only difference I can see is the way of transforming the counts. In my exploreCounts() function I use assay(varianceStabilizingTransformation(object)) where object is a DESeqDataSet containing the counts, dispersions, regression coefficients. Could you try this method instead of using directly the vst() function?
Hugo