Require `api_version` argument in `__dataframe_standard__` rather than `__dataframe_namespace__`?
MarcoGorelli opened this issue · 0 comments
MarcoGorelli commented
Reminder: what these magic methods are
Currently, the way to convert a non-compliant dataframe to a compliant one is by calling df.__dataframe_standard__
:
dataframe-api/spec/purpose_and_scope.md
Lines 362 to 365 in 16dea0b
Once you've got a compliant dataframe, you can get the namespace with __dataframe_namespace__
Why __dataframe_standard__
needs api_version
:
Take the following example:
def remove_outliers(df, column):
# Get a Standard-compliant dataframe.
df_standard = df.__dataframe_standard__(api_version="2023.07")
# Use methods from the Standard specification.
col = df_standard.get_column_by_name(column)
z_score = (col - col.mean()) / col.std()
df_standard_filtered = df_standard.get_rows_by_mask((z_score > -3) & (z_score < 3))
# Return the result as a dataframe from the original library.
return df_standard_filtered.dataframe
I'm not using __dataframe_namespace__
here, so the only way I have of asking for a certain api_version of the standard is via __dataframe_standard__
Why __dataframe_namespace__
probably doesn't need api_version
Say I do
df_standard = df.__dataframe_standard__(api_version="2023.07")
namespace = df_standard.__dataframe_namespace__()
Then it seems natural that the namespace returned would be the "2023.07" one. So it doesn't need repeating in __dataframe_namespace__