metagenome-atlas/atlas

dRep issue - DataFrame.pivot

Closed this issue · 2 comments

Hi Silas -- not quite sure if I should be raising this with you or on drep page, but running through the atlas pipeline I get stuck in the second step of dRep throwing an error TypeError: DataFrame.pivot() takes 1 positional argument but 4 were given. I'm currently running this stage with
atlas run genomes

On the off chance it was an install error, I deleted and reran drep in the vein of your working suggestion here #547

But alas returned to the same error. And that was the closest similar error I was able to find. Any advice?

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************
    
Will filter the genome list
Loading genomes from a list
325 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
    
Running primary clustering
Running pair-wise MASH clustering
Traceback (most recent call last):
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/bin/dRep", line 32, in <module>
    Controller().parseArguments(args)
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/controller.py", line 100, in parseArguments
    self.dereplicate_operation(**vars(args))
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/controller.py", line 48, in dereplicate_operation
    drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
    drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_cluster/controller.py", line 179, in d_cluster_wrapper
    GenomeClusterController(workDirectory, **kwargs).main()
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_cluster/controller.py", line 32, in main
    self.run_primary_clustering()
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
    Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_cluster/compare_utils.py", line 115, in all_vs_all_MASH
    Cdb, cluster_ret = cluster_mash_database(Mdb, **kwargs)
  File "/p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_/lib/python3.10/site-packages/drep/d_cluster/compare_utils.py", line 279, in cluster_mash_database
    linkage_db = db.pivot("genome1","genome2","dist")
TypeError: DataFrame.pivot() takes 1 positional argument but 4 were given

New pandas version - new conflicts.

I guess that you got the latest pandas version 2.0 which creates a bug in drep.

What you can do is to activate the conda env.
conda activate /p/work1/mkardish/subs/databases/conda_envs/e4ca0a910149c0c7b21c70f20a241e3d_

check the version of pandas you have.

conda list pandas

If my assumption is correct install an older version of Pandas.

conda install pandas=1.5.1

That's the version I have.

I think it's also good idea to raise the issue also at drep github and link the two issues.

Awesome! That seemed to fix the conflict.
drep issue opened at : MrOlm/drep#189