sdv-dev/RDT

Fix pandas FutureWarning in UniformEncoder

R-Palazzo opened this issue · 0 comments

Environment Details

  • RDT version: 1.12.1

Error Description

In the UniformEncoder, replacing NaN raises the following FutureWarning sometimes:

FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. 
In a future version, replace will only be used for cases that preserve the categories. 
To change the categories, use ser.cat.rename_categories instead.

It gets raised here:

result = result.replace(nan_name, np.nan)

A fix here would be to use ser.cat.remove_categories()

Step to reproduce

from rdt.transformers import UniformEncoder

intervals = {
    ' United-States': [0.0, 0.8], None: [0.8, 0.9],' Jamaica': [0.9, 0.99]
}
data = pd.Series([0.107995, 0.148025, 0.632702], name='native-country', dtype=float)
transformer = UniformEncoder()
transformer.intervals = intervals
transformer.dtype = 'O'
transformer._reverse_transform(data)