OLC-Bioinformatics/ConFindr

Request details on programmatic database setup for confindr

Closed this issue · 5 comments

Hi,

I used the below command:
confindr_database_setup -s key_secret.txt -o confindr_database/

And obtained the database for only three species as below:
confindr_database$ ls
Escherichia_db_cgderived.fasta Salmonella_db_cgderived.fasta gene_allele.txt rMLST_combined.fasta
Listeria_db_cgderived.fasta download_date.txt profiles.txt refseq.msh

However, I need the db_cgderived.fasta for Yersinia and Campylobacter genus as well!

May i know how to obtain those as well programatically?

Best Regards,
Bala

Hi Bala,

Since you have the rMLST database, you don't need the CGE-derived files. Just run ConFindr in rMLST mode (use the --rmlst flag), and any bacterial genus should be able to be processed.

A

Based on the fact that the Escherichia samples had 38310 bases as the bases examined, it looks like you're still not using the --rmlst mode. Could you please include the command line call to ConFindr you used?

The bases examined are the total number of bases present in the sequence files containing the alleles returned by the KMA screen (this can be printed to the screen using the --verbosity debug argument). This sequence file can be inspected if you use the -k argument to keep the files. It is named as follows: sample_name_alleles.fasta, e.g. FIAR-847_S5_1_trim_alleles.fasta.

If you are using CGE-derived databases, the alleles in the FASTA file should have names like b0436_1, while if you are using the rMLST database, the alleles should have names like BACT000001_10671.

A

pcrxn commented

I'll close this issue in 30 days if there's no further updates!

pcrxn commented

Closed due to stale issue