lmdu/pyfastx

Read comment

esteinig opened this issue · 4 comments

Hey nice tool - quite useful to extend pyfaidx to Fastq. Is there any chance you could implement to read the comment on a read header?

Currently the only accessible attribute is read.name when iterating over pyfastx.Fastq

Also on that note is there a function to write the complete read back to file, something like:

for read in fai:
    output.write(str(read))

This will write the sequence, but not the complete read.

Here is simple Python function for now:

def build_read_string(read, fastq: bool = False, comment: str = None):

    """ Build read string from pyfastx read """

    if fastq:
        return f"@{read.name}{' '+comment if comment else ''}" \
            f"\n{read.seq}\n+\n{read.qual}"
    else:
        return f">{read.name}\n{read.seq}"
lmdu commented

Good suggestion! In later versions, I will consider adding a ".raw" attribution to read and sequence object to get raw string as it appeared in file. But I am not sure if the read comment is important. In many fastq files, the comment line only contains a '+' char.

Thanks that's great to hear! I was imprecise when I said comment, which was a reference to the pysam comment read attribute, containing the content after the read name. Sometimes it contains useful information, for example when generating Fastq files from nanopore basecalling:

@8dc817b4-9485-4b09-884f-c5b4fd741d75 runid=9e281aa698a86f2cde7f5c6db95cdfa8b3edd3ff read=58861 ch=178 start_time=2019-07-30T21:52:20Z

In this case it would be useful to be able to access the string after the @name from the fields runid to start_time