matsengrp/multidms

global epistasis models give different results with `pandas` 2.2

jbloom opened this issue · 2 comments

@jgallowa07, this seems like potentially a significant bug. When I run the global epistasis models in the dms-vep-pipeline-3/test_example, I get substantially different results depending on whether I use pandas 2.1 or pandas 2.2 (I also have pyarrow installed). The pandas 2.1 results are consistent with earlier versions, but the pandas 2.2 are much different.

Note that I cannot rule out that this caused by a bug in pandas 2.2.0 which is a recent release, but I wanted to flag it here.

Good catch @jbloom. Indeed, when I updated to pandas==2.2.0 the unit tests fail. I've added a patch over at #129 That explains things in more detail.

However, this patch is for multidms>=3.1, and I understand that the dms-vep-pipeline is still reliant on an older version of multidms==0.2.1 until dms-vep/dms-vep-pipeline-3#91 gets finished and merged.

Hopefully I will get that PR ready for review this week, but if you need a patch for this issue now I could always branch off of 0.2.1 to patch this specific bug and you could point to that to install directly from github.

Great, and thanks for the explanation of the bug. For now I have pinned pandas=2.1 for the pipeline so this should work fine until the new multidms is merged into the pipeline.