PAIR-code/lit

Potential performance issue: .to_dict method slow in pandas below 2.2

TendouArisu opened this issue · 1 comments

Issue Description:

Hello.
I have discovered a performance degradation in the .to_dict function of pandas version 1.5.3. And I noticed that some parts of the repository depend on the pandas version 1.5.3. I found that many files such as lit_nlp/examples/datasets/glue.py used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #50990 and #54824.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.2 or exploring other solutions to optimize the performance.
Any other workarounds or solutions would be greatly appreciated.
Thank you!

Thanks for the report! There are a few (significant) version bumps in the works for LIT and I'll add this to the list. Will keep you updated on progress as best I can.