comprna/ORQAS

Not much documentation for file headers.

Closed this issue · 1 comments

Hi,

could you please provide more information for the validateiso.txt headers?

cds gene n_cds cov_ribo cov_rna f1 f2 ribo_reads pme

Some of these are obvious, but how exactly are cov_ribo and cov_rna calculated? What is f1 and f2?

Thanks,
Alex

The following information has been added to the documentation:

trans : transcript or transcript to CDS equivalence ids
gene: gene id
n_cds: number of different CDS for that gene
cov_ribo: % of bases with counts other than 0 from Ribo-seq reads (extracted from Ribomap counts per base output file)
cov_rna: % of bases with counts other than 0 from RNA-seq reads (extracted from Ribomap counts per base output file)
f1: proportion [0-1] of reads consistent with the annotated frame (frame 1).
f2: proportion [0-1] of reads consistent with a frame shift of +1 bases respect to the annotated frame (frame 2)*
ribo_reads: number of Ribo-seq reads as provided by Ribomap.
pme: mesure of the uniformity of the Ribo-seq reads along the transcript in terms of Percentage of Maximum entropy
*f3 or % of reads consistent with a frame shift of +2 bases respect to the annotated frame is the result of 100-(f1+f2)

Further information on the calculation of uniformity (pme) and periodicity measures (f1, f2) can be found on the related publication https://doi.org/10.1038/s41467-020-15634-w