Get errors in quickstart

Question

Get errors in quickstart

f-hafner opened this issue a year ago · 2 comments

f-hafner commented a year ago

When following the steps in the quickstart, I get some errors.

To reproduce:

OS: ubuntu 20
git clone git@github.com:alan-turing-institute/privacy-sdg-toolbox.git
cd privacy-sdg-toolbox
poetry install (python 3.9)
then create a new notebook, run the notebook with the project .venv (I use VS code), and follow the steps in the quickstart

The specific errors I get

First, there is something wrong in this cell:

from sklearn.ensemble import RandomForestClassifier

attacker = tapas.attacks.ShadowModellingAtack(
   FeatureBasedSetClassifier(
      tapas.attacks.NaiveSetFeature() + tapas.attacks.HistSetFeature() + tapas.attacks.CorrSetFeature(),
      RandomForestClassifier(n_estimators = 100)
   ),
   label = "Groundhog"
)

~~RandomForestClassifier vs FeatureBasedSetClassifier? (why do we do the import first?)~~
FeatureBasedSetClassifier is never imported
~~either way,~~ when running the cell, I get AttributeError: module 'tapas.attacks' has no attribute 'ShadowModellingAtack'

Second, when training the Groundhog attack:

attacker = tapas.attacks.GroundhogAttack()
attacker.train(threat_model, num_samples=1000)

I get a value error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 3
      1 attacker = tapas.attacks.GroundhogAttack()
----> 3 attacker.train(threat_model, num_samples=1000)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/shadow_modelling.py:87, in ShadowModellingAttack.train(self, threat_model, num_samples)
     84 synthetic_datasets, labels = threat_model.generate_training_samples(num_samples)
     86 # Fit the classifier to the data.
---> 87 self.classifier.fit(synthetic_datasets, labels)
     88 self.trained = True

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:144, in FeatureBasedSetClassifier.fit(self, datasets, labels)
    143 def fit(self, datasets: list[Dataset], labels: list[int]):
--> 144     self.classifier.fit(self.features(datasets), labels)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:85, in SetFeature.__call__(self, *args, **kwargs)
     84 def __call__(self, *args, **kwargs):
---> 85     return self.extract(*args, **kwargs)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:108, in CombinedSetFeatures.extract(self, dataset)
    107 def extract(self, dataset: Dataset) -> np.array:
--> 108     return np.concatenate([f.extract(dataset) for f in self.features], axis=1)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:108, in (.0)
...
--> 101     cidx = [categories.index(c) for c in col_data]
    102     col_data_onehot[np.arange(len(col_data)), cidx] = 1
    104     return col_data_onehot

ValueError: nan is not in list

Answer 1 · 2023-06-21T15:10:03.000Z

Hi, sorry for the late reply.

You are correct, the quickstart is missing tapas.attacks. before FeatureBasedSetClassifier.
There is also a typo (missing a t in ShadowModellingAttack)!

These will be fixed shortly.

I am a bit surprised by the second error. What dataset are you using? (we do not have support for NaNs at this point).

Answer 2 · 2023-07-19T11:54:33.000Z

Hi, thanks! I just found that the second error was a mistake on my part from preparing the UK census data. Sorry!