/Daisy

Primary LanguagePython

Daisy: Relational Data Synthesis using Generative Adversarial Networks

Technical Report: Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration

https://github.com/ruclty/Daisy/blob/master/daisy.pdf

Requirements

Before running the code, please make sure your Python version is above 3.6. Then install necessary packages by :

pip install -r requirements.txt

Datasets

All public datasets we used in our work can be downloaded from the datasets page.

Parameters

You need to write a .json file as the configuration. The keyworks should include :

  • name: required, name of the output file
  • train: required, path of the training file
  • sample: required, path of the sampling file
  • normalize_cols: required, a list contains index of the normalize columns
  • gmm_cols: required, a list contains index of the gmm columns
  • one-hot_cols: required, a list contains index of the one-hot columns
  • ordinal_cols: required, a list contains index of the ordinal columns
  • model: required, model of the generator, LGAN(LSTM) or VGAN(MLP)
  • dis_model: optional, model of the discriminator, lstm or mlp, default mlp
  • n_epochs: required, num of training epochs
  • steps_per_epoch: required, steps per epoch
  • n_search: required, training times
  • rand_search: required, whether to search hyper-parameters randomly
  • param: required if rand_search is 'no', hyper-parameter of the NN
  • train_method: required, training method
  • label: required if train_method is a conditional training, name of the label column
  • KL: optional, whether using KL loss in training, default 'yes'
  • ratio: optional, the ratio of the number of sample records to the real data, default 1
  • sample_times: optional, times of sampling, default 1

Folder "params" contains some examples, you can run the code using those parameter files directly, or write a self-defined parameter file to train a new dataset.

Run

Run the code with the following command :

python Daisy/run.py [parameter file]

or run the following command for quickly start :

./run.sh