Continue from concatenate_core_genome_alignments?
qianxuans opened this issue · 4 comments
Hi,
Thank you so much for maintaining panaroo.
I was wondering if I can use the alignment files in the /core_genome_alignments/ to directly concatenate the core genome alignment.
I first ran panaroo with command:
panaroo -i /panaroo/gff3/*.gff3 -o /panaroo/output/ --remove-invalid-genes --clean-mode strict --alignment core --core_threshold 0.95 --aligner prank --family_threshold 0.7 --refind_prop_match 0.5 --search_radius 5000 --threads 48
However due to some reasons panaroo stopped at the alignment step and I ended up having the output files:
2.4G Mar 5 10:28 combined_DNA_CDS.fasta
7.7M Mar 5 09:52 combined_protein_cdhit_out.txt
90M Mar 5 09:52 combined_protein_cdhit_out.txt.clstr
814M Mar 5 10:28 combined_protein_CDS.fasta
258M Mar 5 10:33 final_graph.gml
3.3G Mar 5 10:28 gene_data.csv
66M Mar 5 10:32 gene_presence_absence.csv
64M Mar 5 10:32 gene_presence_absence_roary.csv
9.6M Mar 5 10:32 gene_presence_absence.Rtab
3.6M Mar 5 10:32 pan_genome_reference.fa
183M Mar 5 09:57 pre_filt_graph.gml
6.8M Mar 5 10:32 struct_presence_absence.Rtab
201 Mar 5 10:32 summary_statistics.txt
and then I continued to run the command:
panaroo-msa -o /panaroo/output/ --verbose --alignment core --core_threshold 0.95 --aligner prank --threads 40
However, I have been stuck for 30 days since the last alignment file was produced in output/aligned_gene_sequences/
Probably Prank is still running but I was wondering if there is any way that I can use all the files in the /aligned_gene_sequences/ and continue to concatenate the core genome alignments?
Best,
Sean
Hi Sean,
I'm afraid we don't have a script to do this independently at the moment. I usually use Mafft when building alignments so have not run into this issue with Prank before.
In case it is helpful the function concatenate_core_genome_alignments
in the file generate_output.py
handles the concatenation in Panaroo
Hi Gerry,
Thank you so much for this great advice!
In the function concatenate_core_genome_alignments
, what does the parameter "core_names" refer to?
Do we still need to function generate_core_genome_alignment
to generate the core_gene_alignment_filtered.aln
?
I've been asked this before, and helped another user prepare a concatenate_core_genome.py
script.
It is available here if it helps: https://github.com/sophbel/LFI_between_country_migration/tree/main/PreProcessing/GPS_Panaroo
Hi nzmacalasdair,
thank you so much for providing the script!!
This script works perfectly!