ksahlin/isONclust

TypeError with simulated fastq data

jkomyno opened this issue · 1 comments

Hi, I have a TypeError when I execute the following command:

isONclust --t 1 --consensus --k 13 --w 20 --batch_type total_nt \
  --aligned_threshold 0.4 --min_prob_no_hits 0.1 \
  --fastq /data/simulated_aligned_reads.fastq \
  --outfolder /data/isONclust/git-issue

The error log is:

Traceback (most recent call last):
  File "isONclust", line 263, in <module>
started sorting seqs
0 reads processed.
10000 reads processed.
    main(args)
  File "/isONclust/isONclust", line 47, in main
    sorted_reads_fastq_file = get_sorted_fastq_for_cluster.main(args)
  File "/isONclust/modules/get_sorted_fastq_for_cluster.py", line 233, in main
    read_array, error_rates = fastq_single_core(args)
  File "/isONclust/modules/get_sorted_fastq_for_cluster.py", line 146 in fastq_single_core
    exp_errors_in_kmers = expected_number_of_erroneous_kmers(qual, k)
  File "/isONclust/modules/get_sorted_fastq_for_cluster.py", line 25, in expected_number_of_erroneous_kmers
    prob_error = [D[char_] for char_ in quality_string]
TypeError: 'NoneType' object is not iterable

I'm running Python 3.9 on Ubuntu 20.04, but I don't believe this has anything to do with the issue.
The data is simulated with NanoSim.

I've added the simulated fastq file here, so you can test it.

Update: it's possible that this problem is due to how NanoSim creates simulated fastq files, see bcgsc/NanoSim#107.