aspuru-guzik-group/chemical_vae

How is the limit_data used in exp.json ?

abhik1368 opened this issue · 2 comments

When we are training a million molecules should we keep the limit_data as 5000 or we change ? What are the parameters affecting in training a set of 1 million ?

Hello, According to the code in the train_vae.py

if 'limit_data' in params.keys():
        sample_idx = np.random.choice(np.arange(len(smiles)), params['limit_data'], replace=False)
        smiles=list(np.array(smiles)[sample_idx])
        if params['do_prop_pred'] and ('data_file' in params):
            if "reg_prop_tasks" in params:
                Y_reg =  Y_reg[sample_idx]
            if "logit_prop_tasks" in params:
                Y_logit =  Y_logit[sample_idx]

so when you want to train a million molecules data you have, you should remove the key "limit_data" in the file exp.json.