sanger-pathogens/seroba

The "cd_cluster.tsv" created recently is different from the previous one

Closed this issue · 4 comments

cd_cluster_old.txt
cd_cluster_new.txt

The cd_cluster_old.txt is the previous one. And the cd_cluster_new.txt is the new cd_cluster.tsv the program created when I tried to build another copy in my another device.
Is this due to new version of KMC or Python3 I used in my new device?

The new cd_cluster.tsv looks like to have some problems and will make the seroba serotyping end with errors for some serotypes.
Please let me know if you could recreate the problem and any solution?

Josh

Hi Josh,

thank you for reporting this problem. Do you know which serotypes are involved?

Best,
Lennard

Hi Lennard,

I compared two version of cd_cluster files. And,
New Serotypes:
['35D', '39X', 'alternative_aliB_NT', 'Swiss_NT', '10X', '11E', '06G', '06F']
Uncovered Old Serotypes:
['07F', '17F', '24B', '24F', '22F', '07A', '41F', '22A', '15F', '31', '23B1', '18F', '19C', '18B', '17A', '18C', '45', '16A']

I found this error when I run analysis on an isolate with "17F" serotype. It works well with previous version but raised an error with new version.

Josh

Hi Josh,
I have been able to reproduce the error. This was related to new default parameter settings in new versions of ariba. Ariba set the maximum sequence length for non coding sequences to a maximum of 20kb by default, so that same serotypes were discarded during the database build process. I adapted this settings for SeroBA. I hope this will work fine for you. For more information please have a log at: #49.

Best,
Lennard

Thanks, Lennard.
I will try the new version.

Josh