gtonkinhill/panaroo

Bakta GFFs with --remove-invalid-genes still throw 'invalid gene!' error, doesn't get to alignment

amyecampbell opened this issue · 2 comments

Hello! Thank you for developing and maintaining Panaroo.

Apologies if this has been answered; I wasn't able to find this exact problem when --remove-invalid-genes was already being invoked.
I am running panaroo 1.4.2 on the *.gff3 files output by Bakta 1.8.2 using the input file list method (providing a .txt file of paths). It gets through a majority of the pre-alignment steps, I think, since the following files are output:

  • 152M combined_DNA_CDS.fasta
  • 1.8M combined_protein_cdhit_out.txt
  • 3.8M combined_protein_cdhit_out.txt.clstr
  • 124M combined_protein_CDS.fasta
  • 9.8M final_graph.gml
  • 291M gene_data.csv
  • 2.7M gene_presence_absence.csv
  • 2.6M gene_presence_absence_roary.csv
  • 120K gene_presence_absence.Rtab
  • 1.4M pan_genome_reference.fa
  • 9.8M pre_filt_graph.gml
  • 80K struct_presence_absence.Rtab
  • 24K summary_statistics.txt

However, after "writing output..." from the cd-hit steps, even with the --remove-invalid-genes flag, I get the following message for 3 different .gff3 files and then panaroo quits (without any other error messages) before starting an alignment:
"invalid gene! file - id: GFFpath/ERR8575034.gff3 - DJLNOM_13595
Length: 27 , Has stop: False"

I have used the following command, and I'm attaching one of the files I got this error for.

panaroo -i GFFlistUpdated.txt -o panaroo_output --remove-invalid-genes --clean-mode strict --search_radius 5000 --refind_prop_match .5 -f .7 -c .98 -t 16 --aligner prank --core_threshold .95

ERR8575034.gff3.txt

Am I missing something very obvious? Is there a flag I need to add in order to even get to the alignment step?
Thank you!

Hi

It looks like you might be missing either the -a core or -a pan flags. Thus you are setting the alignment parameters but not asking Panaroo to perform the alignment.
My guess is that the message being reported is coming from the initial stages of the Panaroo run but is not being flushed to your terminal until the run finishes.

If this isn't the case it could be something strange going on with file paths. I often check for file paths with white space or special characters in these instances.

I hope this helps and let me know if you're still having an issue.

ahhh I had a feeling it was something silly like that. Thank you!