Error when using numpy integers as `bins` value
AlexisMignon opened this issue · 1 comments
AlexisMignon commented
I ran in the error trying to execute the following code:
import numpy as np
import phik
np.random.seed(0)
x = np.linspace(-1, 1, n_samples)
y = x ** 2 + 0.1 * np.random.randn(n_samples)
nbins = np.arange(10, 100, 5)
phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
plt.plot(nbins, phiks, "o-")
UnboundLocalError Traceback (most recent call last)
<ipython-input-417-784e900e88ec> in <module>
4
5 nbins = np.arange(10, 100, 5)
----> 6 phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
7 plt.plot(nbins, phiks, "o-")
<ipython-input-417-784e900e88ec> in <listcomp>(.0)
4
5 nbins = np.arange(10, 100, 5)
----> 6 phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
7 plt.plot(nbins, phiks, "o-")
~/Projets/py38/lib/python3.8/site-packages/phik/phik.py in phik_from_array(x, y, num_vars, bins, quantile, noise_correction, dropna, drop_underflow, drop_overflow)
405 if len(num_vars) > 0:
406 df = array_like_to_dataframe(x, y)
--> 407 x, y = bin_data(df, num_vars, bins=bins, quantile=quantile).T.values
408
409 return phik_from_binned_array(
~/Projets/py38/lib/python3.8/site-packages/phik/binning.py in bin_data(data, cols, bins, quantile, retbins)
125 elif isinstance(bins[col], (list, np.ndarray)):
126 xbins = bins[col]
--> 127 binned_data[col], bin_labels = bin_array(data[col].astype(float).values, xbins)
128 if retbins:
129 bins_dict[col] = bin_labels
UnboundLocalError: local variable 'xbins' referenced before assignment
It is more precisely raised when using numpy integer values for the argument bins
. For instance:
phik.phik_from_array(x, y, bins=np.int64(10), num_vars=["x", "y"])
There are two problems in the code of phik.binning.py
more precisely in the for loop from lines 117 to 129:
for col in cols:
if isinstance(bins, (int, float)):
xbins = bin_edges(data[col].astype(float), int(bins), quantile=quantile)
elif isinstance(bins, dict):
if isinstance(bins[col], (int, float)):
xbins = bin_edges(
data[col].astype(float), int(bins[col]), quantile=quantile
)
elif isinstance(bins[col], (list, np.ndarray)):
xbins = bins[col]
binned_data[col], bin_labels = bin_array(data[col].astype(float).values, xbins)
if retbins:
bins_dict[col] = bin_labels
- The exception is raised because the input type corresponds to none of the tested types. There is a missing
else
clause when the type is unkown. - The first
if
conditions does not capture numpy integer or float types. It would be more robust to test for all integer and float types for instance usingnp.issubdtype
.
I'm preparing a pull request.
mbaak commented
Yes, good catch, this needs fixing. I'll wait for your PR.