RFC: `item()` to return scalar for arrays with exactly 1 element.
randolf-scholz opened this issue · 8 comments
def item(self) -> Scalar:
"""If array contains exactly one element, retun it as a scalar, else raises ValueError."""
Examples:
numpy.ndarray.item
torch.Tensor.item
pandas.Series.item
pandas.Index.item
polars.Series.item
xarray.DataArray.item
Demo:
import pytest
import xarray as xr
import pandas as pd
import polars as pl
import numpy as np
@pytest.mark.parametrize("data", [[], [1, 2, 3]])
@pytest.mark.parametrize(
"array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item_valueerror(data, array_type):
array = array_type(data)
with pytest.raises(ValueError):
array.item()
@pytest.mark.parametrize(
"array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item(array_type):
array = array_type([1])
array.item()
Currently, only torch
fails, because it raises RuntimeError
instead of ValueError
.
This was discussed in #710 , along with the more general to_list
, which works also for ND arrays.
item()
is a bit different from to_list
, and honestly I find it confusing that a method named to_list
can return something that is not a list.
.item()
is more constrained than to_list
indeed, and a bit cleaner. I checked other libraries - NumPy, PyTorch, JAX and CuPy implement .item()
, Dask does not. (TF doesn't have it in the docs, so probably also not - but I can't check). CuPy/JAX do the transfer to CPU if the ndarray is on GPU.
This is a minor convenience method though, since float()
& co work as well. They are clearer, since type-stable, and it also work for Dask. The only downside is that if you want some dtype-generic implementation to return a single element, you have to write a little utility for it to call int
/float
/complex
/bool
as appropriate. Something like:
def as_pyscalar(x):
if xp.isdtype(x, 'real floating'):
return float(x)
elif xp.isdtype(x, 'complex floating'):
return complex(x)
elif xp.isdtype(x, 'integral'):
return int(x)
elif xp.isdtype(x, 'bool'):
return bool(x)
else:
# raise error, or handle custom/non-standard dtypes if desired
Static typing of such a function, and of .item()
, would also be a little annoying as it requires overloads.
item
also works on arrays with multiple dimensions, whereas we decided to make it so float
does not.
>>> np.array([1]).item()
1
We discussed this in a call today, and concluded that this fell into a bucket of functionality that is useful, but also easy to implement on top of what's already in the standard. In addition, there are problems with trying to add this: a item()
method is hard, because it's missing in some libraries and missing methods cannot be worked around in array-api-compat
. If we'd do this, a function would be the way to go - but since that's not present in any libraries, it'd be new - hence more work, and likely to incur resistance from array library maintainers.
Outcome:
- Create the
array-api-extra
package where this kind of function can live, and add it there (probably asas_pyscalar
or a similarly descriptive name, not asitem
) - Only reconsider adding it to the standard itself in the future if most/all array libraries have already added that function.
On a very fundamental level, I believe .item()
makes no sense on DataFrame-like objects (pandas.DataFrame
, polars.DataFrame
, pyarrow.Table
, etc.) because these are designed to represent heterogeneous data types.
From a mathematical PoV, item()
acts on array-like data with homogeneous type, as a representation of the natural isomorphism V →K, when V is a 1-dimensional vector space over K.
Is this usage guaranteed?
If so, should it be added somewhere to the specification? I looked for it here.
FWIW I also like the item
method since it's all I've ever needed and it's simpler than tolist
. I wonder if it should be on the array namespace rather than the array: (def item(x: Array, /) -> complex | bool
) since it can be implemented using the array's public interface. (This is a common test in OO design for what should be a method versus a bare function.)
Yes, __float__
and so on are guaranteed (modulo the "lazy" note). See https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__float__.html#array_api.array.__float__. Though Ralf's helper should also include a if x.ndim != 1 or x.size != 1: raise ValueError
check.