nanoporetech/tombo

ERROR: 'utf-8' codec can't decode byte 0x8b

sagnikbanerjee15 opened this issue · 4 comments

I am facing an error with tombo.

[18:46:58] Getting read filenames.
[18:46:59] Parsing sequencing summary files.
******************** WARNING ********************
	Some FASTQ records from sequencing summaries do not appear to have a matching file.
[18:47:07] Annotating FAST5s with sequence from FASTQs.
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tombo/_preprocess.py", line 148, in _feed_seq_records_worker
    fastq_rec = list(islice(fastq_fp, 4))
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I have used multi_to_single_fast5 to convert the fast5 files from multi to single. But this still does not work. Could you please take a look at it?

I am executing the following command:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir  fast5_pass_barcode77 --fastq-filenames  fastq_pass/barcode77/fastq_pass_barcode77.fastq --sequencing-summary-filenames ../sequencing_summary_FAT23762_13f74adb.txt --overwrite --processes 8

Thank you.

@sagnikbanerjee15 I have the same issue. Did you ever find a solution?

No sorry, I had to give up.

No sorry, I had to give up.

@sagnikbanerjee15 I had this error with a gzipped file and fixed it with gunzip. Are you sure your file is named correctly and it's already ungzipped?

@sagnikbanerjee15 @kenneditodd Just to add, I also had that error. Unzipping the fastq.gz file should resolve that error.