sdv-dev/TGAN

Error with large number of columns

nabarunaguha opened this issue · 1 comments

Hi @ManuelAlvarezC !

I am really intrigued to work with TGAN and it is my first project working with a GAN in general as well.

Recently I trained it with tabular data that contains 1000+ columns and it gave the error:
InvalidArgumentError: Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size (3015923402bytes) would be larger than the limit (2147483647 bytes)

I understand I need to go with dimensionality reduction if I want to train the TGAN with this particular dataset but there are correlations among the columns that will be lost if I use technique like PCA.

Is there any way to work with TGAN with a large number of columns ?

Regards,
Nabaruna

For each column, TGAN uses a hidden layer to generate such column. The hidden layers consume a large amount of memory, so it's not easy to adapt TGAN on datasets with 1000+ columns. We developed a new framework called CTGAN. It can work on datasets with 1000+ columns.

Thanks,
Lei