[Feature Request] Get barcodes from Fastx function
Opened this issue · 1 comments
hotsoupisgood commented
Hi, I would like to get barcode counts using the Fastx function.
If this already is an option please let me know.
Right now I am calling:
for currentRead in pyfastx.Fastx(fastqFile):
...
To iterate through rows to get some statistics, but I would also like barcode counts. With normal python code I do it like so, but its quite slow:
barcodes = {}
with gzip.open(myFastq) as fastq:
for line in fastq:
if not line.startswith(b'@'): continue
bc = line.decode("utf-8").split(':')[-1].strip()
# print(bc)
if bc not in barcodes:
barcodes[bc] = 1
else:
barcodes[bc]+=1
Fastx has sped up some of my other data collection functions so I was hopeful it could do this too!
Thank you
lmdu commented
You can use comment=True
option to get the header line content after first white space in which you may find the barcode.
fq = pyfastx.Fastx(fastqFile, comment=True)
for name,seq,qual,comment in fq:
#use split to get barcode from comment variable like this
bc = comment.split(':')[-1]
Hope this can help you.