gtonkinhill/panaroo

Continue from concatenate_core_genome_alignments?

qianxuans opened this issue · 4 comments

Hi,
Thank you so much for maintaining panaroo.
I was wondering if I can use the alignment files in the /core_genome_alignments/ to directly concatenate the core genome alignment.

I first ran panaroo with command:
panaroo -i /panaroo/gff3/*.gff3 -o /panaroo/output/ --remove-invalid-genes --clean-mode strict --alignment core --core_threshold 0.95 --aligner prank --family_threshold 0.7 --refind_prop_match 0.5 --search_radius 5000 --threads 48
However due to some reasons panaroo stopped at the alignment step and I ended up having the output files:

2.4G Mar  5 10:28 combined_DNA_CDS.fasta
7.7M Mar  5 09:52 combined_protein_cdhit_out.txt
90M Mar  5 09:52 combined_protein_cdhit_out.txt.clstr
814M Mar  5 10:28 combined_protein_CDS.fasta
258M Mar  5 10:33 final_graph.gml
3.3G Mar  5 10:28 gene_data.csv
66M Mar  5 10:32 gene_presence_absence.csv
64M Mar  5 10:32 gene_presence_absence_roary.csv
9.6M Mar  5 10:32 gene_presence_absence.Rtab
3.6M Mar  5 10:32 pan_genome_reference.fa
183M Mar  5 09:57 pre_filt_graph.gml
6.8M Mar  5 10:32 struct_presence_absence.Rtab
201 Mar  5 10:32 summary_statistics.txt

and then I continued to run the command:
panaroo-msa -o /panaroo/output/ --verbose --alignment core --core_threshold 0.95 --aligner prank --threads 40

However, I have been stuck for 30 days since the last alignment file was produced in output/aligned_gene_sequences/
Probably Prank is still running but I was wondering if there is any way that I can use all the files in the /aligned_gene_sequences/ and continue to concatenate the core genome alignments?

Best,
Sean

Hi Sean,

I'm afraid we don't have a script to do this independently at the moment. I usually use Mafft when building alignments so have not run into this issue with Prank before.

In case it is helpful the function concatenate_core_genome_alignments in the file generate_output.py handles the concatenation in Panaroo

Hi Gerry,
Thank you so much for this great advice!
In the function concatenate_core_genome_alignments, what does the parameter "core_names" refer to?
Do we still need to function generate_core_genome_alignment to generate the core_gene_alignment_filtered.aln?

I've been asked this before, and helped another user prepare a concatenate_core_genome.py script.

It is available here if it helps: https://github.com/sophbel/LFI_between_country_migration/tree/main/PreProcessing/GPS_Panaroo

Hi nzmacalasdair,
thank you so much for providing the script!!
This script works perfectly!