lmdu/pyfastx

[Feature Request] Get barcodes from Fastx function

Opened this issue · 1 comments

Hi, I would like to get barcode counts using the Fastx function.
If this already is an option please let me know.

Right now I am calling:
for currentRead in pyfastx.Fastx(fastqFile): ...
To iterate through rows to get some statistics, but I would also like barcode counts. With normal python code I do it like so, but its quite slow:

barcodes = {}
with gzip.open(myFastq) as fastq:
        for line in fastq:
                if not line.startswith(b'@'): continue
                bc = line.decode("utf-8").split(':')[-1].strip()
                # print(bc)
                if bc not in barcodes:
                        barcodes[bc] = 1
                else:
                        barcodes[bc]+=1

Fastx has sped up some of my other data collection functions so I was hopeful it could do this too!
Thank you

lmdu commented

You can use comment=True option to get the header line content after first white space in which you may find the barcode.

fq = pyfastx.Fastx(fastqFile, comment=True)
for name,seq,qual,comment in fq:
    #use split to get barcode from comment variable like this
    bc = comment.split(':')[-1]

Hope this can help you.