I wonder 'else' clause should be commented out (or deleted) in prepDE.py3, in the block of badGenes check
mickeykawai opened this issue · 0 comments
I wonder 'else' clause (at l.172-173 of version v2.2.1 Latest on Jan 26, 2022) should be commented out (or deleted) in the block of badGenes check in prepDE.py3
.
for s in samples:
badGenes=[] #list of bad genes (just ones that aren't MSTRG)
try:
...
except StopIteration:
warnings.warn("Didn't get a GTF in that directory. Looking in another...")
else: #we found the "bad" genes!
break
The present version with the else
clause results in setting geneIDs[t_id]=g_id
in the for
loop only for the first sample but not the rest of the samples.
When there are expression differences among alternative splicing variants among samples, the number of the gene of some samples can be fewer in gene_count_matrix.csv
than the sum of splicing variants seen in transcript_count_matrix.csv
.
Then at l.279,
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
when the geneIDs
does not have corresponding record of g_id
for a t_id
, <class 'str'>
will be set (and appear at the last line of gene_count_matrix.csv
.
Example of my observation for a gene is as follows.
With the present version:
$ grep '\bSR34\b' gene_count_matrix.csv
MSTRG.110|SR34,99,29,0,50,0,0,24,96,24 # <- the value of the last 2 columns, i.e. 96 and 24, is not the sum of the count of 3 transcripts.
$ grep AT1G02840 transcript_count_matrix.csv
AT1G02840.1,44,29,0,50,0,0,24,96,24
AT1G02840.2,55,0,0,0,0,0,0,0,0
AT1G02840.3,,0,0,0,0,0,0,18,32
$
After comment out of the else
block:
$ grep '\bSR34\b' gene_count_matrix.csv
MSTRG.110|SR34,99,29,0,50,0,0,24,114,56
$ grep AT1G02840 transcript_count_matrix.csv
AT1G02840.1,44,29,0,50,0,0,24,96,24
AT1G02840.2,55,0,0,0,0,0,0,0,0
AT1G02840.3,,0,0,0,0,0,0,18,32
$