noatamir/pyladies-berlin-sprints

ENH / CoW: Use the "lazy copy" (with Copy-on-Write) optimization in more methods where appropriate

Closed this issue · 1 comments

See pandas-dev/pandas#49473 for more details on the background of the issue.

That issue has a long list of all the methods we should address, but some of them might be less straightforward or might require discussion. Therefore, making a smaller list here that I think should be possible to tackle during the sprint:

  • assign
  • reorder_levels
  • droplevel
  • align
  • swaplevel
  • (convert_dtypes)

Pull requests tackling one of the bullet points above are certainly welcome!

  • Pick one of the methods above (best to stick to one method per PR)
  • Update the method to make use of a lazy copy (in many cases this might mean using copy(deep=None) somewhere, but for some methods it will be more involved)
  • Add a test for it in /pandas/tests/copy_view/test_methods.py (you can mimick on of the existing ones, eg test_select_dtypes)
    • You can run the test with PANDAS_COPY_ON_WRITE=1 pytest pandas/tests/copy_view/test_methods.py to test it with CoW enabled (pandas will check that environment variable). The test needs to pass with both CoW disabled and enabled.
    • The tests make use of a using_copy_on_write fixture that can be used within the test function to test different expected results depending on whether CoW is enabled or not.

Example PR: pandas-dev/pandas#49557

Take drop level