soedinglab/plass

Paired read prediction - mergereads failed

dnolin13 opened this issue · 3 comments

Expected Behavior

Hello,
I am trying to run PLASS on a curated set of marine viral metagenomic reads. I have two read files, and I am trying to run PLASS on them but I am getting the following error:

Start merging reads.
Segmentation fault (core dumped)
Error: mergereads failed
deactivate does not accept arguments
remainder_args: ['PLASS']

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Here is the script I am using for the plass assembler:

conda activate PLASS

/home/delaney/miniconda3/envs/PLASS/bin/plass assemble /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted1.fq /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted2.fq assembly.fas tmp

conda deactivate PLASS

Plass Output (for bugs)

Please make sure to also post the complete output of Plass. You can use gist.github.com for large output.

Include only extendable true
Skip repeating k-mers true
Min codons in orf 45
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Protein Filter Threshold 0.2
Filter Proteins 1
Search iterations 12
Delete temporary files incremental 1
Remove temporary files false
MPI runner
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1

PAIRED END MODE
mergereads /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted1.fq /home/delaney/5YV/test2/02-kraken2-viruses-only/extracted2.fq /home/delaney/5YV/test2/scripts/tmp/1996441830643183315/nucl_reads -v 3

Start merging reads.
Segmentation fault (core dumped)
Error: mergereads failed
deactivate does not accept arguments
remainder_args: ['PLASS']

Context

Providing context helps us come up with a solution and improve our documentation for the future.

these are viral metagenomic reads sequenced on a novaseq. they were identified as viral using kraken2 and the reads from my dataset that were viral were then put into the 2 files extracted1.fq and extracted2.fq (for fwd and rev).

Your Environment

I am running this in a conda environment, where i installed plass using bioconda on a linux machine.

is it possible to provide us your input reads? My first guess would be some problem with the quality strings within the fastq files. Which Plass version are you using?

Sure, I can provide you a few of the reads if that would help. I had to attach the .fq files s a txt file to get them here.

In terms of the version of plass, I'm not entirely sure, but I downloaded it last week using the bioconda install. Thanks for the help!
subsetForGithubRev.txt
subsetForGithub.txt

you mentioned .fq files, but the read files you provide are in fasta format. Plass utilize the FLASH code for merging paired end reads in the first step, however FLASH needs the quality string of the fastq file format to merge reads. It fails without such a line within the input files, if you provide multiple files. In the current state of the code on GitHub the same error would give you the error message "Invalid sequence record found".

If you have the chance to get the quality strings, you can call Plass with the two paired-end files in fastq format. If not, you can provide Plass with a single file in fasta format. Therefore, you can either use another tool to merge your paired-end reads before (if there is a one that can work without the quality string) or simple concat your files together without making use of the pairing information.