doubt about Circular contig identifiers
msubirana opened this issue · 4 comments
I don't understand the 3rd parameter in organisms_fasta_list
Circular contig identifiers are indicated in the following columns
I have a list of fna from Streptococcus suis:
SS_0001 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0001.fna
SS_0007 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0007.fna
SS_0008 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0008.fna
SS_0011 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0011.fna
SS_0016 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0016.fna
SS_0039 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0039.fna
SS_0042 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0042.fna
SS_0059 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0059.fna
SS_0062 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0062.fna
SS_0063 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0063.fna
SS_0064 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0064.fna
SS_0073 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0073.fna
Which is the cicular contig in this case?
I'm not sure as well about how to use this parameter:
--anno ORGANISMS_ANNOTATION_LIST
Thanks in advance!
Hi @msubirana, you only need to worry about this field if you know you have circular contigs in your genomes.
If you know which contigs in your genomes are circular you can provide them in a tab delimited list after the path of your input fasta. You can see their example here: example file . In this example, on line 4 you can see "NC_017436.1 NC_017433.1", these are the contig identifiers for the circular contigs in this assembly.
If you have complete genomes and you know all the contigs are circular, you can use this helper function from a little R package I put together: https://github.com/Jtrachsel/pdtools#generate-an-input-file-for-caclulating-a-pangenome-with-ppanggolin
Thanks! I'm not aware if I have circular genomes, how can I check it?
I run the pipe with the organisms_fasta_list that I showed previously and I got this error:
Exception: The gene family has not beed associated to a partition
Hello,
For the circularity of genomes, If you downloaded them from somewhere it's likely written on the website if they are circular. If it is internal data, the best person that will know is the one that assembled the genomes. You cannot really know using a fasta file only.
Overall, if you don't know, I'd recommend to just ignore that aspect, it won't change a lot of things in the end.
For the error, what command line did you use exactly ?
My assumption is that you did not use the minimal command line "ppanggolin workflow --fasta ORGANISMS_FASTA_LIST" which you should likely use in your case (if you only have the 'organisms_fasta_list' file).
Adelme
@msubirana @axbazin May I close this issue?