Binning tool for improve binning percentage in existing binning methods
-
run the python script with taxonomic label data
$python mahalanobis_binning.py 10s binned_contigs_features.csv unbinned_contigs_features.csv taxon.csv final_ouput.csv --p 1500 9 argument 1 : dataset name (ex : '10s') argument 2 : initial binning method feature file (ex : binned_contigs_features.csv) argument 3 : unbinned contigs feature file (ex : unbinned_contigs_features.csv) argument 4 : taxon file [this contain the taxons of all contigs] (ex : taxon.csv )[OPTIONAL] argument 5 : final output file (ex :final_ouput.csv) argument 6 : indicate parameter selcting (ex: "--p") [OPTIONAL] argument 7 : filtering length (ex: 1500) [OPTIONAL] argument 8 : Critical value (ex: 9) [OPTIONAL]
-
sample dataset in shown in the test folder.
-
feature files for unbinned contigs and binned contigs should follow following structure.
Binned feature file headers => [contig_id, contig_length, bin , feature_1, feature_2,....., feature_N ] Unbinned feature file headers => [contig_id, contig_length, feature_1, feature_2,....., feature_N ]
-
taxon file contian following structure (not a requirement to have headers in csv file)
Taxon file headers => [contig_id, Actual taxon]
-
argument 4 (taxon file) is needed to calculate accuracy and other properties of results. Even without that it is possible to run the script and get the output.
$python mahalanobis_binning.py 10s binned_contigs_features.csv unbinned_contigs_features.csv final_ouput.csv
-
You can pick the filtering length and another threshold value in command line option. Depending on those values the results may changes.The order of argument follow the following.
--p [filtering length] [critical value] eg : taxon file available case : python mahalanobis_binning.py 10s sample_data/10s/10s_binned_contigs_features.csv sample_data/10s/10s_unbinned_contigs_features.csv sample_data/10s/10s_taxon.csv sample_data/10s/10s_final_ouput.csv --p 1500 9 taxon file not available case : python mahalanobis_binning.py 10s sample_data/10s/10s_binned_contigs_features.csv sample_data/10s/10s_unbinned_contigs_features.csv sample_data/10s/10s_final_ouput.csv --p 1500 9
-
you can provide the relative paths or absolute path of files.
-
final output will display following format csv file
output file => [id, bin]
-
PythonTestBed.java file is there to run the script using java.(usefull for web applications). If it doesn't need (for normal run), make
testBedEnable = 0
-
for degugging perposes it is possible to enbale progressbar
progressBarEnable = 1