KaveIO/PhiK

Error when using numpy integers as `bins` value

AlexisMignon opened this issue · 1 comments

I ran in the error trying to execute the following code:

import numpy as np
import phik

np.random.seed(0)
x = np.linspace(-1, 1, n_samples)
y = x ** 2 + 0.1 * np.random.randn(n_samples)

nbins = np.arange(10, 100, 5)
phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
plt.plot(nbins, phiks, "o-")
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-417-784e900e88ec> in <module>
      4 
      5 nbins = np.arange(10, 100, 5)
----> 6 phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
      7 plt.plot(nbins, phiks, "o-")

<ipython-input-417-784e900e88ec> in <listcomp>(.0)
      4 
      5 nbins = np.arange(10, 100, 5)
----> 6 phiks = [phik.phik_from_array(x, y, bins=n, num_vars=["x", "y"]) for n in nbins]
      7 plt.plot(nbins, phiks, "o-")

~/Projets/py38/lib/python3.8/site-packages/phik/phik.py in phik_from_array(x, y, num_vars, bins, quantile, noise_correction, dropna, drop_underflow, drop_overflow)
    405     if len(num_vars) > 0:
    406         df = array_like_to_dataframe(x, y)
--> 407         x, y = bin_data(df, num_vars, bins=bins, quantile=quantile).T.values
    408 
    409     return phik_from_binned_array(

~/Projets/py38/lib/python3.8/site-packages/phik/binning.py in bin_data(data, cols, bins, quantile, retbins)
    125             elif isinstance(bins[col], (list, np.ndarray)):
    126                 xbins = bins[col]
--> 127         binned_data[col], bin_labels = bin_array(data[col].astype(float).values, xbins)
    128         if retbins:
    129             bins_dict[col] = bin_labels

UnboundLocalError: local variable 'xbins' referenced before assignment

It is more precisely raised when using numpy integer values for the argument bins. For instance:

phik.phik_from_array(x, y, bins=np.int64(10), num_vars=["x", "y"])

There are two problems in the code of phik.binning.py more precisely in the for loop from lines 117 to 129:

    for col in cols:
        if isinstance(bins, (int, float)):
            xbins = bin_edges(data[col].astype(float), int(bins), quantile=quantile)
        elif isinstance(bins, dict):
            if isinstance(bins[col], (int, float)):
                xbins = bin_edges(
                    data[col].astype(float), int(bins[col]), quantile=quantile
                )
            elif isinstance(bins[col], (list, np.ndarray)):
                xbins = bins[col]
        binned_data[col], bin_labels = bin_array(data[col].astype(float).values, xbins)
        if retbins:
            bins_dict[col] = bin_labels
  1. The exception is raised because the input type corresponds to none of the tested types. There is a missing else clause when the type is unkown.
  2. The first if conditions does not capture numpy integer or float types. It would be more robust to test for all integer and float types for instance using np.issubdtype.

I'm preparing a pull request.

mbaak commented

Yes, good catch, this needs fixing. I'll wait for your PR.