OLC-Bioinformatics/ConFindr

Missing PercentContam in cgMLST mode

Closed this issue · 1 comments

I am trying to run ConFindr in the (experimental) cgMLST mode, using a subset of a Staphylococcus aureus cgMLST scheme downloaded from https://www.cgmlst.org/ncs (and formatted as described in the ConFindr documentation).

The command I tried:

confindr -fid .R1. -rid .R2. -i fastq_files/ -o output_dir -t 4 -Xmx 10g --verbosity debug --keep_files -cgmlst cgmlst_subset.fasta

The stdout is then:

  2023-12-11 11:41:29  Welcome to ConFindr 0.8.1! Beginning analysis of your samples... 
  2023-12-11 11:41:29  Beginning analysis of sample sampleName... 
  2023-12-11 11:41:29  Sample is paired. Sample name is sampleName 
  2023-12-11 11:41:29  Checking for cross-species contamination... 
  2023-12-11 11:41:46  Extracting conserved core genes... 
  2023-12-11 11:42:05  Quality trimming... 
  2023-12-11 11:43:40  Detecting contamination... 
  2023-12-11 11:43:40  Since this is the first time you are using this database, it needs to be indexed by KMA. This might take a while 
  2023-12-11 11:44:10  Total gene length is 1633944 
  2023-12-11 13:43:45  Done! Number of contaminating SNVs found: 166
 
  2023-12-11 13:43:49  Contamination detection complete! 

And the contents of confindr_report.csv are:

Sample,Genus,NumContamSNVs,ContamStatus,BasesExamined,DatabaseDownloadDate
sampleName,Staphylococcus,166,True,1633944,2021-4-29

Notably missing from the output is the estimate of percent contamination (PercentContam) that I am used to seeing.

Some details:

ConFindr version 0.8.1 from bioconda.
KMA version 1.4.9

My bad - I did not realize that the estimated percent contamination level had been removed (commit ec3ae7a).