EMBL-PKU/BASALT

Almost 1/3 of the obtained bins from BASALT were removed by dRep dereplicate -sa 0.99.

Opened this issue · 1 comments

In my case, although BASALT obtained much more bins (115) than metaWRAP (75), much of them are identified as the same species by GTDB. And after treated by dRep dereplicate -sa 0.99, 79 bins were finally obtained, which was as many as metawrap in number. Thus, the effectiveness of de-replicate algorithms for removing redundant genomes in BASALT is debatable, and I doubt BASALT recovers more and higher quality MAGs from benchmark data than other binning tools due to this reason!

I also end up with several bins with ANI > 0.99. In my latest run BASALT gave 113 bins that were reduced to 86 after running dRep. It would be great if you improved the de-replication step.