/k-anonymity

Evaluating variety of k-Anonymity techniques.

Primary LanguagePythonMIT LicenseMIT

Data anonymization using k-Anonymity

✔️ Experiments

  • Provides 5 k-anonymization method:
    • Datafly
    • Incognito
    • Topdown Greedy
    • Classic Mondrian
    • Basic Mondrian
  • Implements 3 anonymization metrics:
    • Equivalent Class size metric (CAVG)
    • Discernibility Metric (DM)
    • Normalized Certainty Penalty (NCP)
  • Implements 3 classification models:
    • Random Forests
    • Support Vector Machines
    • K-Nearest Neighbors

📖 Reports

  • Report edit link: link
  • Slide link: link

Folder Structure

  • A dataset must comes with a .csv file contains features information and a hierarchy folder which contains predefined generalization hierarchies for its QID attributes.
this repo
│   anonymize.py
|
└───data  
│   │
│   └───adult
│       │   adult.csv
│       └───hierarchies
│       │     adult_hierarchy_workclass.csv
│       │     ....
  • Here is an example for a generalization hierarchy of the 'workclass' attribute from ADULT dataset, described in adult_hierarchy_workclass.csv, which is a csv file using ";" as delimiter
Private;Non-Government;*
Self-emp-not-inc;Non-Government;*
Self-emp-inc;Non-Government;*
Federal-gov;Government;*
Local-gov;Government;*
State-gov;Government;*
Without-pay;Unemployed;*
Never-worked;Unemployed;*

which describes this tree:

screen

🌟 Executing

To anonymize dataset, run:

python anonymize.py --method=<model_type> --k=<k-anonymity> --dataset=<dataset_name>
  • model_type: [mondrian | classic_mondrian | mondrian_ldiv | topdown | cluster | datafly]
  • dataset_name: [adult | cahousing | cmc | mgm | informs | italia]

Results will be in results/{dataset}/{method} folder

To run evaluation metrics on every combination of algorithms, datasets and value k, run:

python visualize.py

Results will be in demo/{metrics.png, metrics_ml.png}

K-Anonymity examples

Before anonymization After anonymization with k = 2
screen screen

Evaluation Metrics

Evaluate anonymization using information loss metrics
screen
screen
Evaluate anonymization using classification models
screen
screen

References: