YosefLab/ImpulseDE2

rownames in dfAnnotation

Janderscan opened this issue · 5 comments

Hi,

First of all, congrats for developing this software and for the publication. I am trying to use it to find age related gene expression patterns.

I just wanted to let you know that the structure of the dfAnnotation in the documentation might be misleading. I was always obtaining a geom_point error in "plotGenes" function because I was not setting the rownames of the dfAnnotation with sample id. It took me long time to realize, since "dfAnnotation" contains already a "Sample" column that I thought it was used for linking the counts matrix and the annotation data.frame. What is that column used for then?

In general, I would appreciate a bit more detailed documentation. Also maybe a method to extract the normalized counts in case someone wants to make a custom plot. For example, in DESEq2 plotCounts, there is a returnData argument to retrieve the data instead of plotting.

EDIT: when using the whole dataset, I get a "WARNING: Found size factors==0, setting these to 1."
When inspecting the size factors with get_vecSizeFactors(), I see that all samples have a size factor of 1. Is it fine to continue the analysis with all size factors set to 1? Shall I provide runImpulseDE2 with a vector of size factors calculated with DESeq2::estimateSizeFactors?

Thanks

EDIT: when using the whole dataset, I get a "WARNING: Found size factors==0, setting these to 1." When inspecting the size factors with get_vecSizeFactors(), I see that all samples have a size factor of 1. Is it fine to continue the analysis with all size factors set to 1? Shall I provide runImpulseDE2 with a vector of size factors calculated with DESeq2::estimateSizeFactors?

If you dont provide size factors, they are computed internally, there seems to be something weird with your data in this case, you can definitely provide size factors from DESeq2.

For example, in DESEq2 plotCounts, there is a returnData argument to retrieve the data instead of plotting.

You can retrieve model fits from ImpulseDE2 similar to what s done in plotGene, see here for example for a solution that worked for somebody else: #11.

When using the exact same counts matrix, DESeq2 calculate size factors well. I think it has to be with the way of calculating the geometric mean in impulseDE2. For a large dataset like mine (> 250 samples), the product of all counts for a gene (prod(gene[!is.na(gene)]))^(1/sum(!is.na(gene))) exceeds the limit of double precision vectors. vecGeomMean contains just Inf values when using the whole dataset. If I use a subset of just 100 columns, it returns finite values.
DESeq2 instead uses the log-average approach for geometric mean calculation

@Janderscan thanks for looking into it! Do you want to start a PR with the suggested fix?

Thanks David, but it is probably better if you just include it with other fixes that may arise in the future. I just wanted to know if there was something wrong with my counts matrix or not. At the end, it just affects large data sets and it can be solved by providing with the size factors calculated from DESeq2.

True, thanks again for looking into it!