Request details on programmatic database setup for confindr

Question

Request details on programmatic database setup for confindr

Closed this issue a year ago · 5 comments

Hi,

I used the below command:
confindr_database_setup -s key_secret.txt -o confindr_database/

And obtained the database for only three species as below:
confindr_database$ ls
Escherichia_db_cgderived.fasta Salmonella_db_cgderived.fasta gene_allele.txt rMLST_combined.fasta
Listeria_db_cgderived.fasta download_date.txt profiles.txt refseq.msh

However, I need the db_cgderived.fasta for Yersinia and Campylobacter genus as well!

May i know how to obtain those as well programatically?

Best Regards,
Bala

Answer 1 · 2022-03-14T14:10:36.000Z

Hi Bala,

Since you have the rMLST database, you don't need the CGE-derived files. Just run ConFindr in rMLST mode (use the --rmlst flag), and any bacterial genus should be able to be processed.

A

Answer 2 · 2022-03-14T14:23:45.000Z

Hi, Thanks for the reply! I tested as you mentioned and got the results below: $cat ecoli_test/results/confindr/confindr_report.csv Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate FIAR-847_S5_1_trim,Escherichia,0,False,0,0,38310,ND FIAR-847_S5_2_trim,Escherichia,0,False,0,0,38310,ND $cat salmonella_test/results/confindr/confindr_report.csv Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate FIAR-844_S2_L001_1_trim,Salmonella,0,False,0,0,61956,ND FIAR-844_S2_L001_2_trim,Salmonella,1,False,0,0,61956,ND $ cat listeria_test/results/confindr/confindr_report.csv Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate FIXT-208_S17_L001_1_trim,Listeria,0,False,0,0,28425,ND FIXT-208_S17_L001_2_trim,Listeria,0,False,0,0,28425,ND There are BasesExamined for the above three species. However, the following two species miss that information as below: $ cat campy_test/results/confindr/confindr_report.csv Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate 131469S9L001_1_trim,Campylobacter,0,False,ND,ND,0,ND 131469S9L001_2_trim,Campylobacter,0,False,ND,ND,0,ND $ cat yersinia_test/results/confindr/confindr_report.csv Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate FIXT-266_S6_L001_1_trim,Yersinia,0,False,ND,ND,0,ND FIXT-266_S6_L001_2_trim,Yersinia,0,False,ND,ND,0,ND Could you clarify why the BasesExanined were zero for the above two species and have some value to only E.coli, Salmonell and Listeria? It would be nice to know how these BasesExamined values are produced in confindr tool? Best Regards, Bala From: adamkoziol ***@***.***> Sent: Monday, March 14, 2022 4:11 PM To: OLC-Bioinformatics/ConFindr ***@***.***> Cc: Jayaprakash Balamuralikrishna (Ruokavirasto) ***@***.***>; Author ***@***.***> Subject: Re: [OLC-Bioinformatics/ConFindr] Request details on programmatic database setup for confindr (Issue #33) Hi Bala, Since you have the rMLST database, you don't need the CGE-derived files. Just run ConFindr in rMLST mode (use the --rmlst flag), and any bacterial genus should be able to be processed. A — Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOLC-Bioinformatics%2FConFindr%2Fissues%2F33%23issuecomment-1066841133&data=04%7C01%7Cbalamuralikrishna.jayaprakash%40ruokavirasto.fi%7C37b16cca7fb94422543408da05c4707f%7C7c14dfa4c0fc47259f0476a443deb095%7C0%7C0%7C637828638500684721%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QzfuaO0euVIOHMlcqDyhlPmQ3Ejz8HIj6KCe2NSxmgM%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUFIHIPDM4R4E3EZ7IXX2PDU75CGNANCNFSM5QQBXHYA&data=04%7C01%7Cbalamuralikrishna.jayaprakash%40ruokavirasto.fi%7C37b16cca7fb94422543408da05c4707f%7C7c14dfa4c0fc47259f0476a443deb095%7C0%7C0%7C637828638500684721%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=NGs9hwB1GBas9GtMCoJWoDxhtQjyySu4M50g4KAxdCU%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cbalamuralikrishna.jayaprakash%40ruokavirasto.fi%7C37b16cca7fb94422543408da05c4707f%7C7c14dfa4c0fc47259f0476a443deb095%7C0%7C0%7C637828638500684721%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=c2PoBCcxRBpF5W%2FYzB39%2FQ4gOA1hzzdumHCCd%2FFtCkg%3D&reserved=0> or Android<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cbalamuralikrishna.jayaprakash%40ruokavirasto.fi%7C37b16cca7fb94422543408da05c4707f%7C7c14dfa4c0fc47259f0476a443deb095%7C0%7C0%7C637828638500684721%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MccyAAT4xvdQQFPOYiOGJvCpCm1gDzX7ORTT12%2Bx6BA%3D&reserved=0>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

Answer 3 · 2022-03-14T15:03:08.000Z

Based on the fact that the Escherichia samples had 38310 bases as the bases examined, it looks like you're still not using the --rmlst mode. Could you please include the command line call to ConFindr you used?

The bases examined are the total number of bases present in the sequence files containing the alleles returned by the KMA screen (this can be printed to the screen using the --verbosity debug argument). This sequence file can be inspected if you use the -k argument to keep the files. It is named as follows: sample_name_alleles.fasta, e.g. FIAR-847_S5_1_trim_alleles.fasta.

If you are using CGE-derived databases, the alleles in the FASTA file should have names like b0436_1, while if you are using the rMLST database, the alleles should have names like BACT000001_10671.

A

Answer 4 · 2023-05-23T19:36:04.000Z

I'll close this issue in 30 days if there's no further updates!

Answer 5 · 2023-08-15T14:05:35.000Z

Closed due to stale issue