TypeError in datafunk
Opened this issue · 2 comments
ArtPoon commented
Encountered the following exception while attempting to run a recent dump of the GISAID CoV database:
(pangolin) art@orolo:~/git/covizu/data$ datafunk sam_2_fasta -s /home/art/git/covizu/data/reference_mapped.sam -r /home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin-2.0.4-py3.6.egg/pangolin/data/reference.fasta -o /home/art/git/covizu/data/post_qc_query.aligned.fasta -t [265:29674] --pad --log-inserts
Traceback (most recent call last):
File "/home/art/miniconda3/envs/pangolin/bin/datafunk", line 8, in <module>
sys.exit(main())
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/datafunk/__main__.py", line 1010, in main
args.func(args)
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/datafunk/subcommands/sam_2_fasta.py", line 87, in run
trimend = trimend)
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/datafunk/sam_2_fasta.py", line 269, in sam_2_fasta
seq = get_seq_from_block(sam_block = one_querys_alignment_lines, rlen = RLEN, log_inserts = log, pad = pad)
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/datafunk/sam_2_fasta.py", line 201, in get_seq_from_block
seq_flat_no_internal_gaps = swap_in_gaps_Ns(block_lines_sites_list[0], pad = pad)
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/site-packages/datafunk/sam_2_fasta.py", line 172, in swap_in_gaps_Ns
for x in re.findall(r_internal, seq):
File "/home/art/miniconda3/envs/pangolin/lib/python3.6/re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
ArtPoon commented
Modification from line 268 in datafunk/sam_2_fasta.py
:
try:
seq = get_seq_from_block(sam_block = one_querys_alignment_lines, rlen = RLEN, log_inserts = log, pad = pad)
except:
print(query_seq_name)
print(one_querys_alignment_lines)
raise
yielded:
hCoV-19/pangolin/Guangxi/P4L/2017|EPI_ISL_410538|2017
[<pysam.libcalignedsegment.AlignedSegment object at 0x7f885a4b7ac8>]
So, yeah, let's not try to classify non-human genomes!
ArtPoon commented
My guess is that reads that fail to align to the reference are stored as None
objects in the AlignedSegment
object. There should be an exception handler for such cases.