LFCs are non-targeting over perturb instead of the other way around
chriswzou opened this issue · 4 comments
Hi! In the vignette, it seems like the log fold changes are displayed as perturb / NT, as after this code:
de_res = Run_wmvRegDE(object = seurat_obj, assay = "RNA", slot = "counts",
labels = "gene", nt.class.name = "neg",
PRTB_list = c("GATA1", "GINS1", "MTOR"),
logfc.threshold = 0.2,
split.by = NULL)
GATA1 knockout shows a negative log fold change.
However, I discovered while using the package that in the current version of Run_wmvRegDE
, the output is NT / perturb. I think it's due to this code, where the non-targeting index is entered first.
We could simply change the code to be like this:
fc = FoldChange_new(obj = GetAssayData(object = object, slot = "data"),
cells.1 = colnames(object)[idx_P_celltype],
cells.2 = colnames(object)[idx_NT_celltype],
# mean.fxn = function(x) log(x = Matrix::rowMeans(x = expm1(x = x)) + pseudocount.use, base = base), # old version in seurat v4
mean.fxn = function(x) log(x = (rowSums(x = expm1(x = x)) + pseudocount.use)/NCOL(x), base = base),
fc.name = "avg_log2FC",
features = rownames(x = object) )
And I think that would fix things! Let me know if you'd like me to put up a PR, or if I'm barking up the wrong tree here.
Hi @chriswzou ,
Thank you for using our package and helping us improve our codes! We really appreciate all your suggestions.
However, for the logFC, this is actually intentional and not a bug. We did this to keep it consistent with the original FoldChange() function in our Seurat package. The interpretation of FoldChange(ident.1 = "group 1", ident.2 = "group 2", ...) is "the log fold change in group 1 compared to group 2". In our case, "group 1" is always the NT group. In addition, if you use FindMarkers() from Seurat package, you will observe the same behavior.
Thanks for your response! I do understand that the interpretation is log fold change between group 1 and group 2. To be really precise, what confused me is that group 1 (or in this case, the cells.1
argument) is always the NT group instead of the perturbed group.
This confused me because it means if I have a perturbation that's an ATRIP knockout, and I pass that perturbation into the perturbation slot, the reported log fold change for ATRIP will be positive. Maybe this is a convention I'm not familiar with yet, but I do think that normally when I have reported differential expression, I report it as "this gene was 5 fold lower than it was in the non-targeting control," not "this gene had a non-targeting control expression 5 fold higher than in the perturbation." Plus, the rest of the vignette does validate the DEG by showing a knockout will have a negative scaled expression—like the GATA1 example. In fact in the enrichment test section of the vignette, it actually seems that the CTRL group gets passed into the ident.2
argument, which would again lead to a negative LFC in the case of a knock out:
Idents(ifnb) = "stim"
res = DEenrich(object = ifnb,
plist = plist,
ident = "stim",
ident.1 = "STIM",
ident.2 = "CTRL",
split.by = "seurat_annotations",
direction = "up",
logfc.threshold = 0.2,
p.val.cutoff = 0.05,
min.pct = 0.1)
Ultimately totally respect the package and any preferences you have! But as a user, I would like to express that changing the vignette to be more consistent—or in my opinion, having the NT group default as group 2—might make things more clear.
Hi @chriswzou ,
Thank you! You definitely raised a good point here. And it is true that in the other parts of the vignette we treat the directions differently from FoldChange(). We will carefully discuss over it and will update our package accordingly in our next release.
Again thank you for bringing this up. Truly appreciate your suggestions for our package!
Fixed in the latest update.