soedinglab/plass

Output FASTA header format?

sjaenick opened this issue · 2 comments

Assembled protein sequences contain additional information, e.g.

[Orf: 39, 242, 18446744073709551615, 1, 1]

What is the meaning of these numbers? Is there any coverage information
included? (If not, can it be added?)

Thanks.

@sjaenick thank you for trying Plass. 👍

The information is current just the open reading frame of the center fragment that got extended. So it is not very useful. You can compute the coverage by mapping back the reads using the mmseqs2 map workflow.

Thanks.