leekgroup/recount-website

<NA> values in rse_tx.RData

Closed this issue · 9 comments

Hey

I've noticed that there are some transcripts that contain NA's in the assay count table. E.g. ENST00000622420.1 in DRP001055. In this case there are NA's for all four samples. In the GTEx data there are a total of 4.2 million NA's e.g. for transcript ENST00000604479.5, but here it's only for a subset of the samples.

Could you please verify that for me and let me know how to interpret this? I've been struggling with his for a few days now.

Thank you

Hi @hwartmann,

This is basically the same as the second question in leekgroup/recount#18 that Jack Fu @JMF47 will answer.

Best,
Leo

JMF47 commented

Hi @hwartmann, I have responded in the other thread. Brief recap here is that when read-lengths of samples differ, we have differing abilities to estimate transcript abundances.

Thank you for getting back to me @JMF47

So what is your suggestion to deal with these transcripts? Can I set the NA's to zero or should I drop any transcripts containing a NA?

JMF47 commented

What is your objective? I would recommend against setting NAs to 0. Whether or not you drop a transcript that contains any NAs depends on what you would like to do with the data.

Will I run into the same issue if I work with recount2 gene or exon counts?

JMF47 commented

I do not believe so, but @lcolladotor can chime in on the gene and exon count front.

OK, thanks. But in any case, we do not really understand how what you described can result in NA. Could you maybe elaborate a bit more or point me to source that would explain this to us?

JMF47 commented

https://www.biorxiv.org/content/biorxiv/early/2018/01/12/247346.full.pdf. Particularly, the estimation of the feature matrix, which calculates the expect number of counts falling into each exon/junction feature depending for a random read of a certain read-length.

There are no NAs on the counts for the gene/exon RSE objects. The counting method is different for those than for the transcript ones. See https://f1000research.com/articles/6-1558/v1 for the gene/exon ones.