global epistasis models give different results with `pandas` 2.2

Question

global epistasis models give different results with `pandas` 2.2

jbloom opened this issue 9 months ago · 2 comments

@jgallowa07, this seems like potentially a significant bug. When I run the global epistasis models in the dms-vep-pipeline-3/test_example, I get substantially different results depending on whether I use pandas 2.1 or pandas 2.2 (I also have pyarrow installed). The pandas 2.1 results are consistent with earlier versions, but the pandas 2.2 are much different.

Note that I cannot rule out that this caused by a bug in pandas 2.2.0 which is a recent release, but I wanted to flag it here.

Answer 1 · 2024-01-30T01:27:24.000Z

Good catch @jbloom. Indeed, when I updated to pandas==2.2.0 the unit tests fail. I've added a patch over at #129 That explains things in more detail.

However, this patch is for multidms>=3.1, and I understand that the dms-vep-pipeline is still reliant on an older version of multidms==0.2.1 until dms-vep/dms-vep-pipeline-3#91 gets finished and merged.

Hopefully I will get that PR ready for review this week, but if you need a patch for this issue now I could always branch off of 0.2.1 to patch this specific bug and you could point to that to install directly from github.

Answer 2 · 2024-01-30T03:11:03.000Z

Great, and thanks for the explanation of the bug. For now I have pinned pandas=2.1 for the pipeline so this should work fine until the new multidms is merged into the pipeline.