labgem/PPanGGOLiN

Projection file returns spots for genes not in RGPs

Closed this issue · 3 comments

I noticed that the projections file can return spots for genes that are not listed as having an RGP. My understanding from the panRGP manuscript is that spots are by definition groups of RGPs in the same genomic location. I was wondering whether this is a bug, or that I misunderstood the spot concept. I will provide my output files below. Projections were created using ppanggolin write --projection. Thanks in advance!

Pangenome file: https://drive.proton.me/urls/R2AMAWJW3W#qZuCxaOzf2o9
Projection file, where the first row has spots but no RGPs: https://drive.proton.me/urls/QX64FPZK5M#xEZFoUROFMDp

Perhaps related: a gene in a particular RGP can have spots listed, which other genes in the same RGP do not have listed. For example, the top two rows have one spot and the third row has multiple spots, while they belong to the same RGP. These come from the projection file linked above.

gene contig start stop strand family nb_copy_in_org partition persistent_neighbors shell_neighbors cloud_neighbors RGPs Spots Modules
BUR30_RS22995 NZ_FRFY01000079.1 336 1058 - BUR30_RS22995 1 cloud 0 0 0 NZ_FRFY01000079.1_RGP_0 95 None
BUR30_RS23000 NZ_FRFY01000079.1 1208 1552 + BUR30_RS23000 1 cloud 0 0 0 NZ_FRFY01000079.1_RGP_0 95 None
BUR30_RS23005 NZ_FRFY01000079.1 1555 1989 + BUI81_RS18235 1 cloud 0 0 0 NZ_FRFY01000079.1_RGP_0 95,82,75,89 module_67

Hi,

Indeed very much related, it looks like this field is filled by listing the spots in which the gene's family is in, and not the spot of the gene itself !

Thank you for the bug report, someone will fix it in the upcoming release

Adelme

Great, thanks for getting back to me!