gtonkinhill/panaroo

gene presence/absence file having ";" without merging paralogs

mhjonathan opened this issue · 3 comments

Hi,

I'm running the Panaroo with gff3 files without merging paralogs in strict mode.

panaroo
--threads 20
--input {input}
--out_dir {out_dir}
--clean-mode strict
--remove-invalid-genes
--threshold 0.98
--family_threshold 0.7

And when I check gene_presence_absence.csv file, there are some queries like below (Yellow marked):

image

They are not paralogs, almost no similarity between those queries. Can you tell me what are them?

Hi,

This is usually caused by fragmented genes which Panaroo will merge together.
Depending upon the reading frame they were originally called in, they can look different from the other genes in the cluster, so it is important to also consider the DNA sequence when comparing them.

Hi, thank you for the answer.

Then how can I deal with this problem? I expect fragmented genes would be filtered out with --remove invalid-genes option. It can draw incorrect conclusion with clustering gene with fragmented gene that has no connection each other, right?

Hi,

Sorry, I missed your reply. The --remove-invalid-genes filters out invalid GFF entries but not fragmented gene calls. Instead, Panaroo merges these together with the ';' as a delimiter.