opendp/smartnoise-sdk

predicate always fails. Error at center

lo2aayy opened this issue · 2 comments

When I train an AIM model with the continous_columns being an empty list []

I opened an issue in initially in OpenDp (predicate always fails. Error at center: inferred type is i32, expected f64. See opendp/opendp#298)
then I thought maybe it's smartnoise related since this happens only when the continous_columns is empty.

predicate always fails. Error at center: inferred type is i32, expected f64. See https://github.com/opendp/opendp/discussions/298
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /Workspace/Repos/.internal/005c97e368_commits/08a43358e25669f66a31e94c5c713d45168c8955/src/models/aim.py:38, in Aim.train_synthesizer(self)
     37 try:
---> 38     self.synth.fit(self.X_train, preprocessor_eps=self.preprocessor_eps,
     39                    categorical_columns=self.categorical_cols, continuous_columns=self.numerical_cols)
     40 except Exception as e:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/snsynth/aim/aim.py:134, in AIMSynthesizer.fit(self, data, transformer, categorical_columns, ordinal_columns, continuous_columns, preprocessor_eps, nullable, *ignore)
    132 self.num_rows = len(data)
--> 134 self.rho = 0 if self.delta == 0 else cdp_rho(self.epsilon, self.delta)
    136 data = pd.DataFrame(train_data, columns=colnames)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/snsynth/utils.py:53, in cdp_rho(epsilon, delta)
     52     return make_fix_delta(adp, delta=budget[1])
---> 53 scale = binary_search_param(
     54     make_fixed_approxDP_gaussian,
     55     d_in=1.0, d_out=budget, T=float)
     56 return make_base_gaussian(scale).map(1.)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/opendp/mod.py:465, in binary_search_param(make_chain, d_in, d_out, bounds, T)
    414 """Useful to solve for the ideal constructor argument.
    415 
    416 Optimizes a parameterized chain `make_chain` within float or integer `bounds`,
   (...)
    463 1498
    464 """
--> 465 return binary_search(lambda param: make_chain(param).check(d_in, d_out), bounds, T)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/opendp/mod.py:529, in binary_search(predicate, bounds, T, return_sign)
    528 if bounds is None:
--> 529     bounds = exponential_bounds_search(predicate, T)
    531 if bounds is None:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/opendp/mod.py:662, in exponential_bounds_search(predicate, T)
    660         error = f". Error at center: {err}"
--> 662     raise ValueError(f"predicate always fails{error}")
    664 center, sign = binary_search(exception_predicate, bounds=exception_bounds, T=T, return_sign=True)

ValueError: predicate always fails. Error at center: inferred type is i32, expected f64. See https://github.com/opendp/opendp/discussions/298

I reproduced the error with pums dataset.

  • If you uncommented the line of df.drop, and added categorical_cols.remove('income')
    continuous_columns = ['income'], it will work normally
  • This is only for AIM, for MWEM its working normally
!pip install smartnoise-synth==1.0.0
!pip install git+https://github.com/ryan112358/private-pgm.git
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from snsynth import Synthesizer

path = "path/to/PUMS.csv"
df = pd.read_csv(path)
continuous_columns = []

df.drop("income", axis=1)
categorical_cols = list(df.columns)

labelEncoderDict = defaultdict(LabelEncoder)
for column in categorical_cols:
        df[column] = labelEncoderDict[column].fit_transform(df[column])

preprocessor_eps = 0.5
synth = Synthesizer.create("aim", epsilon=5, verbose=True)
synth.fit(df, preprocessor_eps=preprocessor_eps,
                           categorical_columns=categorical_cols, continuous_columns=continuous_columns)

Hi @lo2aayy, apologies for the delayed response. This is because OpenDP requires epsilon to be a floating point number. If you change epsilon to 5.0, the example above will work. I will add some code for this synthesizer to automatically convert integers to float if the caller passes in an integer, but the mitigation in the meantime is to use floats. Thank you for reporting this.