Official GitHub for CTAB-GAN+

Primary LanguagePython


This is the official git paper CTAB-GAN+: Enhancing Tabular Data Synthesis. Current code is WITHOUT differential privacy part. The code with differential privacy is in this github. If you have any question, please contact z.zhao-8@tudelft.nl for more information.


The required package version


The sklean package in newer version has updated its function for sklearn.mixture.BayesianGaussianMixture. Therefore, user should use this proposed sklearn version to successfully run the code!


Experiment_Script_Adult.ipynb Experiment_Script_king.ipynb are two example notebooks for training CTAB-GAN+ with Adult (classification) and king (regression) datasets. The datasets are alread under Real_Datasets folder. The evaluation code is also provided.

Problem type

You can either indicate your dataset problem type as Classification, Regression. If there is no problem type, you can leave the problem type as None as follows:

problem_type= {None: None}

For large dataset

If your dataset has large number of column, you may encounter the problem that our currnet code cannot encode all of your data since CTAB-GAN+ will wrap the encoded data into an image-like format. What you can do is changing the line 378 and 385 in model/synthesizer/ctabgan_synthesizer.py. The number in the slide list

sides = [4, 8, 16, 24, 32]

is the side size of image. You can enlarge the list to [4, 8, 16, 24, 32, 64] or [4, 8, 16, 24, 32, 64, 128] for accepting larger dataset.


To cite this paper, you could use this bibtex

  title={CTAB-GAN+: Enhancing Tabular Data Synthesis},
  author={Zhao, Zilong and Kunar, Aditya and Birke, Robert and Chen, Lydia Y},
  journal={arXiv preprint arXiv:2204.00401},