data-apis/dataframe-api-compat

Use a function call to monkey patch and not on import?

thomasjpfan opened this issue · 5 comments

For scikit-learn, I prefer to not automatically monkeypatch when using this library. I plan to use a custom function to get the DataFrame API:

def get_dataframe_standard(df):
    if hasattr(df, "__dataframe_standard__"):
        return df.__dataframe_standard__()
    elif _is_pandas_df(df):
        from .pandas_standard import dataframe_standard as pandas_dataframe_standard

        return pandas_dataframe_standard(df)
    ...

Is there a way to adjust this library to monkey patch after a function call?

from pandas_standard import patch_dataframe

patch_dataframe("pandas")

A similiar pattern is used in Intel's scikit-learn implementation.

Hey @thomasjpfan

You could do

from pandas_standard import PandasDataFrame

df_compliant = PandasDataFrame(df)

Eventually the aim is to get this upstreamed to pandas/polars, but for now, that should work to test it out and see what the gaps are

The issue is that importing pandas_standard itself will monkeypatch, even if I only want to use PandasDataFrame:

from pandas_standard import PandasDataFrame
import pandas as pd

x = pd.DataFrame([[1, 2, 3], [4, 5, 6]])

assert hasattr(x, "__dataframe_standard__")

A library using pandas_standard will end up monkeypatching a user's environment.

Specifically, the pandas_standard import will monkeypatch in the following:

def get_dataframe_standard(df):
    if hasattr(df, "__dataframe_standard__"):
        return df.__dataframe_standard__()
    elif _is_pandas_df(df):
        from pandas_standard import PandasDataFrame
        return PandasDataFrame(df)
    elif _is_polars_df(df):
        ...

ah makes sense

I'll just remove the monkeypatching then

have updated - does this work for you?

Yea, that's okay. The only nit is that convert_to_standard_compliant_dataframe should raise an error if it does not recognize the input:

https://github.com/MarcoGorelli/impl-dataframe-api/blob/318463d3aabe92a7a92d05e8c5611b286f1f6d17/standard.py#L8-L14

I likely wont be able to use standard.py as is, because scikit-learn optionally depends on the dataframe libraries. But I can work around it by having my own custom function.