alan-turing-institute/tapas

Get errors in quickstart

f-hafner opened this issue · 2 comments

When following the steps in the quickstart, I get some errors.

To reproduce:

  • OS: ubuntu 20
  • git clone git@github.com:alan-turing-institute/privacy-sdg-toolbox.git
  • cd privacy-sdg-toolbox
  • poetry install (python 3.9)
  • then create a new notebook, run the notebook with the project .venv (I use VS code), and follow the steps in the quickstart

The specific errors I get

First, there is something wrong in this cell:

from sklearn.ensemble import RandomForestClassifier

attacker = tapas.attacks.ShadowModellingAtack(
   FeatureBasedSetClassifier(
      tapas.attacks.NaiveSetFeature() + tapas.attacks.HistSetFeature() + tapas.attacks.CorrSetFeature(),
      RandomForestClassifier(n_estimators = 100)
   ),
   label = "Groundhog"
)
  • RandomForestClassifier vs FeatureBasedSetClassifier? (why do we do the import first?)
  • FeatureBasedSetClassifier is never imported
  • either way, when running the cell, I get AttributeError: module 'tapas.attacks' has no attribute 'ShadowModellingAtack'

Second, when training the Groundhog attack:

attacker = tapas.attacks.GroundhogAttack()
attacker.train(threat_model, num_samples=1000)

I get a value error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 3
      1 attacker = tapas.attacks.GroundhogAttack()
----> 3 attacker.train(threat_model, num_samples=1000)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/shadow_modelling.py:87, in ShadowModellingAttack.train(self, threat_model, num_samples)
     84 synthetic_datasets, labels = threat_model.generate_training_samples(num_samples)
     86 # Fit the classifier to the data.
---> 87 self.classifier.fit(synthetic_datasets, labels)
     88 self.trained = True

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:144, in FeatureBasedSetClassifier.fit(self, datasets, labels)
    143 def fit(self, datasets: list[Dataset], labels: list[int]):
--> 144     self.classifier.fit(self.features(datasets), labels)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:85, in SetFeature.__call__(self, *args, **kwargs)
     84 def __call__(self, *args, **kwargs):
---> 85     return self.extract(*args, **kwargs)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:108, in CombinedSetFeatures.extract(self, dataset)
    107 def extract(self, dataset: Dataset) -> np.array:
--> 108     return np.concatenate([f.extract(dataset) for f in self.features], axis=1)

File ~/repositories/projects/GANS/privacy-sdg-toolbox/tapas/attacks/set_classifiers.py:108, in (.0)
...
--> 101     cidx = [categories.index(c) for c in col_data]
    102     col_data_onehot[np.arange(len(col_data)), cidx] = 1
    104     return col_data_onehot

ValueError: nan is not in list

Hi, sorry for the late reply.

  1. You are correct, the quickstart is missing tapas.attacks. before FeatureBasedSetClassifier.
  2. There is also a typo (missing a t in ShadowModellingAttack)!

These will be fixed shortly.

I am a bit surprised by the second error. What dataset are you using? (we do not have support for NaNs at this point).

Hi, thanks! I just found that the second error was a mistake on my part from preparing the UK census data. Sorry!