Enquiry on downsampling to ensure balance of conditions along pseudotime for conditionTest
JesseRop opened this issue · 2 comments
Dear developers,
Thank you very much for this great tool.
I am doing DE between 2 conditions which are unbalanced along pseudotime and I am trying to understand whether downsampling is necessary as this results in loss of cells and hence power.
This is my original dataset
I have then downsampled to this
But I lose lots of cells and I also have other similar datasets where I am loosing cells when downsampling.
My question is whether I can run tradeseq on the original dataset without downsampling or whether the downsampling approach I have applied to ensure balance along pseudotime is the correct way to go about it.
thanks,
Jesse
Hello,
There is no need to downsample. The covariance matrix of the coefficients of each condition will incorporate the number of samples in that uncertainty, which will also be reflected in the conditionTest
.
In your case though, the red condition does not seem to follow a trajectory be to form very clear clusters. I would therefore just be careful to the number of knots and be sure to test using a log-fold change cutoff using the l2fc
argument
Dear @HectorRDB ,
Many thanks for your response. It is very helpful!
Biologically we expect that the cluster in the red condition at the very terminal end (bottom right) is a much more developmentally advanced population than all the other cells.
I have been able to run fitGAM
(with nknots
= 6) and then conditionTest
with 'log2(1.25)' threshold in l2fc
argument.
Using plotSmoothers
to visualize expression per gene, it seems the smoother lines for each condition are always below the average expression of the first population on the left half of the plots. I have given 2 examples below. I have tried playing around with a range of knots (3-7) but it doesn't change much. Kindly advise on whether this is expected. Could be due to the gap between the red condition populations?
Thanks!
Below are plots for the same genes generated in ggplot (+ geom_point() + geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs"))
)