StevenWingett/FastQ-Screen

FastQC on output

Closed this issue · 5 comments

I'm trying to run a FASTQC on the output of any filtered FastQ_screen --filter and am receiving errors about a corrupted fastq file

Code ran on the test dataset.

First tagged the fastq file

FastQ-Screen-0.15.2/fastq_screen --tagged fqs_test_dataset.fastq.gz

Filtered out reads mapped to yeast

FastQ-Screen-0.15.2/fastq_screen --filter ---0 fqs_test_dataset.tagged.fastq.gz

Run FastQC

FastQC/fastqc -t 8 fqs_test_dataset.tagged_filter.fastq

I receive the following error.

Failed to process fqs_test_dataset.tagged_filter.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

I checked and FastQC runs on both the original file, and the tagged file. It's only the filtered file where there's corruption. Is there anyway I can either recover the integrity of this filtered fastq file OR ensure that the filtered file is output without corruption so I can run a FASTQC and use it for other downstream analysis?

Hi,

FastQ Screen should not corrupt FASTQ files. It would be nice to learn what is happening.

Are you able to look inside the corrupted FASTQ file and view the problematic read or reads? If so, are you able to extract one of these reads? Would you be able to send me a copy of this read before processing with FastQ Screen and after processing?

Thanks,
Steven

Hi,

I also ran into this issue. It looks like FastQ Screen produces gzipped FASTQ output files from gzipped input files in the filtering step but skips the .gz ending in the file names. FastQC complains as it expects uncompressed data. Adding .gz to the file names solved the problem for me.

Cheers,
Thorsten

Hi Thorsten,

Thanks for that message. That would cause a problem and I'll see if I can replicate here. Just to check I am understanding correctly, the the final filtered FASTQ file has a file ending .fastq when it should be .fastq.gz, since the file is gzipped?

Thanks,
Steven

Hi Steven,

That's right. I had a look at your code and it looks like the cause is that the variable $zip_data_output is not set correctly when performing filtering independently (without tagging) on already tagged gzipped FASTQ files. In this case process_tag_files is already called on line 171 in the fastq_screen script, while the variable would only be set to 1 further below. In addition, this function makes use of variables like $pass and $inverse that also seem to be only checked or initialized below in the code (without --inverse). For me, the latter caused a lot of error messages (Use of uninitialized value $inverse in subtraction (-) at fastq_screen line 689, <IN_PROCESS_TAG> line XXX.) that seemingly did not affect the result.

I hope I got this correctly and it helps to solve the issue.

Thanks and best,
Thorsten

Hi,

Thanks for your feedback. I believe this bug is now fixed in the latest release:
https://github.com/StevenWingett/FastQ-Screen/releases/tag/v0.15.3

Please let me know if you experience any further issues with FastQ Screen.

Thanks,
Steven