mkusner/grammarVAE

Latent dimensionality consistency

Closed this issue · 2 comments

Hi,
On the way to finger out the reason of the very low accuracy, I noticed there are some inconsistencies. For the grammar based training, in train_zinc.py, the default LATENT = 56, but in model_zinc.py, latent_rep_size = 2 appeared twice: in def create(....) and def load(....)
For the string based training,, in train_zinc_str.py, the default is also LATENT = 56, but in model_zinc_str.py, latent_rep_size = 292 appeared twice. in def create(....) and def load(....).
Is it normal? Is it necessary to keep them consistent for each type of the training?
In keras-molecule, the default latent is 292, for your grammar based latent space, Is there any particular consideration to make it 56?
Thanks!
Toushi68

Hey Toushi68,

Sorry for the confusion. We used a latent dimensionality of 56 in all of our experiments, except when we visualize the latent space (Figure 6 in the appendix: https://arxiv.org/pdf/1703.01925.pdf), where we use a dimensionality of 2.

The reason we chose 56 was because this was used in the CVAE paper: https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00572 I believe they found this number using Bayesian optimization to tune this hyperparameter.

So my recommendation is to always keep it 56 dimensional (unless you do more tuning :))).