klebgenomics/Kleborate

K and O loci identification

AntonS-bio opened this issue · 1 comments

Hi, i've noticed correct, but possibly unintended behavior in version 2.0.1 though it's probably present in subsequent ones.

When identifying K loci
kleborate -a my.fasta -k

the results shows some KL and OL so long as fasta contains sequence longer than used kmer (21nt). I've tested for random sequences shorter than 30nt, long non-bacterial sequences, etc. The output correctly specifies K_locus_confidence as None, so behavior is not an error, but I wouldn't be surprised if quite a few people ignore the confidence especially when looking at results in terminal instead of Kleborate_results.txt. This would lead to manuscripts misreporting loci.

For example, in my terminal, the None for column appears K_locus_confidence appears exactly under RmST.

Maybe the current behaviour, showing K and O locus, but giving confidence None, should be changed to showing something like "Unknown" for K and O locus?

Hi,

Thanks for bringing this to our attention. We have also noticed that users sometimes ignore the Kaptive confidence information. For reference, we recommend reporting only K and O loci with a confidence level of "Good" or better unless further investigation has been carried out e.g. to check for missing sequence, assembly fragmentation problems or putative novel loci etc. We've made an update to the latest version of Kleborate (v2.0.4) to change the default behaviour so that any loci with confidence levels 'Low' or 'None' are reported as 'unknown' with the best matching locus shown in parentheses (confidence threshold can be changed using the --min_kaptive_confidence flag). We hope this will help to minimise reporting of erroneous results.

Thanks again for your feedback.
Kelly