longmanz/Mixscale

Expression Level of Target and Perturbation Scores Positively Correlated?

MujeebQadiri opened this issue · 5 comments

Hi!

First off, thank you for this analysis tool!

I am analyzing some Perturb-Seq data and have followed the vignette. I ran Mixscale_Scatterplot and interestingly the perturbation score seems positively correlated to the expression of my knocked-down transcription factors. I would expect a the highest perturbation score to be correlated with very low expression of the target gene.

I'm interested to know what you believe may be happening or whether I have made an error in my code. Thank you!

plot_zoom_png

Here is my code:

library(Seurat)
library(ggridges)
library(ggplot2)
library(Mixscale)
library(dplyr)

file <- "experimental"
load(paste0("data/",file,"_scMAGeCK_meta_filtered.RData"))

filtered_seurat@meta.data$perturb_status <- case_when(is.na(filtered_seurat@meta.data$gene) ~ "NP",
                                                      filtered_seurat@meta.data$gene == "Non-Targeting" ~ "NT",
                                                      .default = "KO")

# standard pre-processing
filtered_seurat = NormalizeData(filtered_seurat)
filtered_seurat = FindVariableFeatures(filtered_seurat, nfeatures = 2000)
filtered_seurat = ScaleData(filtered_seurat)
filtered_seurat = RunPCA(filtered_seurat)
Idents(filtered_seurat) <- "gene"

# calculate Perturbation signatures 
seurat_obj <- CalcPerturbSig(
  object = filtered_seurat, 
  assay = "RNA", 
  slot = "data", 
  gd.class ="gene", 
  nt.cell.class = "Non-Targeting", 
  reduction = "pca", 
  ndims = 40, 
  num.neighbors = 20, 
  new.assay.name = "PRTB", 
  split.by = NULL) 

# run mixscale
seurat_obj = RunMixscale(
  object = seurat_obj, 
  assay = "PRTB", 
  slot = "scale.data", 
  labels = "gene", 
  nt.class.name = "Non-Targeting", 
  min.de.genes = 5, 
  logfc.threshold = 0.2,
  de.assay = "RNA",
  max.de.genes = 100, 
  new.class.name = "mixscale_score", 
  fine.mode = F, 
  verbose = F, 
  split.by = NULL)

# Visualize
RidgePlot(
  seurat_obj,
  features = "mixscale_score",
  group.by = "gene") + NoLegend()

# Check if the scores correlate with the expression level of the target gene itself
Mixscale_ScatterPlot(object = seurat_obj, 
                     nt.class.name = "Non-Targeting", 
                     slct.ident = unique(seurat_obj$gene)[unique(seurat_obj$gene) != "Non-Targeting"][1:8], 
                     nbin = 10, 
                     facet_wrap = "gene") + NoLegend()

Best,
Mujeeb

Hi Mujeeb,
Thank you for using our tool. The positive correlation is indeed unexpected. Based on your results, I notice a few points that might worth looking into:

  1. the expression of these transcription factors (TFs) seem relatively low (less than 0.5). Is that as expected? Have you checked their expression level in the NT group? You may do a quick check by running DoHeatmap() for each TFs.
  2. Do you expect to observe strong batch effect in your data? You can check this by plotting a standard UMAP. Our tool currently does not work well when there is a strong batch effect.
  3. Have you checked the RidgePlot of the Mixscale scores for each perturbation? Do the distributions look normal (bi-modal)?
  4. When running RunMixscale, you may consider using a higher logfc.threshold (e.g., 0.5). This will reduce the number of small/moderate DE genes that are used for Mixscale score calculation. I suspect that these small effect DE genes mask the true signals.

Hi Mujeeb, Thank you for using our tool. The positive correlation is indeed unexpected. Based on your results, I notice a few points that might worth looking into:

  1. the expression of these transcription factors (TFs) seem relatively low (less than 0.5). Is that as expected? Have you checked their expression level in the NT group? You may do a quick check by running DoHeatmap() for each TFs.
  2. Do you expect to observe strong batch effect in your data? You can check this by plotting a standard UMAP. Our tool currently does not work well when there is a strong batch effect.
  3. Have you checked the RidgePlot of the Mixscale scores for each perturbation? Do the distributions look normal (bi-modal)?
  4. When running RunMixscale, you may consider using a higher logfc.threshold (e.g., 0.5). This will reduce the number of small/moderate DE genes that are used for Mixscale score calculation. I suspect that these small effect DE genes mask the true signals.

Hi, thank you for your reply!

  1. It seems as though the expression of some of these transcription factors is low in both the NT and perturbed cells. I will check with the scientist I am working with to see if this is expected for their experiment.

image

image

  1. I have two conditions, stimulated and control -- however, these are being processed separately and in both cases this trend can be observed.

  2. I have checked the RidgePlot and the distribution looks bimodal for most of the sgRNAs (Stat92E, foxo, REPTOR) although very weakly.

image

  1. I tried using a logfc threshold of 0.5 however the trend remains the same.

image
image

Best,
Mujeeb

Perhaps a bit off-topic but thought I should note: for guide RNA calling, is it sufficient to use the Cell Ranger algorithm to determine guide-cell assignment?

My workflow for pre-processing includes filtering the CRISPR Gene Expression matrix produced by Cell Ranger by the UMI threshold (for each sgRNA to remove ambient guide RNA) and assigning each cell's identity according to whether a sgRNA passes the threshold (and accordingly, identifying multiplets).

Best,
Mujeeb

Hi Mujeeb,
Thank you for your information. Based on your results, the positive correlation seems to be real, but it is unclear to me why that would be the case.

  1. Have you run Run_wmvRegDE() and visualize the DEGs for each perturbation (Mixscale_DoHeatmap()) ? It is possible that the expression of these TFs is not a good indicator of perturbation strength (since they are lowly expressed), and so the correlation between the mixscale scores and the expression is weak.

  2. Another possibility is that there is some batch effect between your perturbed cells and NT cells. In that case, the shift of expression is not driven by perturbation, so there is weak correlation between the mixscale score and perturbation strength.

Hi Mujeeb, Thank you for your information. Based on your results, the positive correlation seems to be real, but it is unclear to me why that would be the case.

  1. Have you run Run_wmvRegDE() and visualize the DEGs for each perturbation (Mixscale_DoHeatmap()) ? It is possible that the expression of these TFs is not a good indicator of perturbation strength (since they are lowly expressed), and so the correlation between the mixscale scores and the expression is weak.
  2. Another possibility is that there is some batch effect between your perturbed cells and NT cells. In that case, the shift of expression is not driven by perturbation, so there is weak correlation between the mixscale score and perturbation strength.

I agree with you, this may be the case because the TFs are so lowly expressed. Looking at the wmvRegDE, many of the downstream targets we expect to be affected are significantly down-regulated.

Thank you for your help!