data-apis/dataframe-api-compat

Example to adopt API doesn't work for polars

amrit110 opened this issue · 3 comments

Hi i tried the example

import pandas as pd
import polars as pl


df_pandas = pl.read_parquet('iris.parquet')
df_polars = pl.scan_parquet('iris.parquet')

def my_dataframe_agnostic_function(df):
    df = df.__dataframe_consortium_standard__(api_version='2023.08-beta')

    mask = df.get_column_by_name('species') != 'setosa'
    df = df.filter(mask)

    for column_name in df.get_column_names():
        if column_name == 'species':
            continue
        new_column = df.get_column_by_name(column_name)
        new_column = (new_column - new_column.mean()) / new_column.std()
        df = df.insert(loc=len(df.get_column_names()), label=f'{column_name}_scaled', value=new_column)

    return df.dataframe

#  Then, either of the following will work as expected:
my_dataframe_agnostic_function(df_pandas)
my_dataframe_agnostic_function(df_polars)

but this fails for the polars dataframe with the error:

AttributeError: 'PolarsDataFrame' object has no attribute 'filter' for the filter operation. I tried with dataframe_api_compat version 0.1.13 and polars version 0.19.3.

thanks @amrit110 for the report - the example should either have

  • api_version='2023.09-beta'
  • OR, api_version='2023.08-beta' and .get_rows_by_mask instead of .filter

I think I'll change it to the former and make a new release

We're still in "beta" so unfortunately breaking changes may still happen, but I'll like to get the point where we can make a "near perfect backwards compatibility" promise

I've tagged a new api version and pushed a new version (0.1.16) of the implementation, does this work for you now?

this now works! @MarcoGorelli . thanks :)