Possible mistake in validity w/o correction

Question

Possible mistake in validity w/o correction

asiraudin opened this issue 2 years ago · 1 comments

Hi,

Thanks for sharing your code. I have a question concerning your code to compute validity without correction.

This is your procedure to compute this metric

gen_mols, num_mols_wo_correction = gen_mol(x, adj, self.configt.data.data)
num_mols = len(gen_mols)
...
logger.log(f'validity w/o correction: {num_mols_wo_correction / num_mols}')

and here is the code for the gen_mol function :

def gen_mol(x, adj, dataset, largest_connected_comp=True):    
    # x: 32, 9, 5; adj: 32, 4, 9, 9
    x = x.detach().cpu().numpy()
    adj = adj.detach().cpu().numpy()

    if dataset == 'QM9':
        atomic_num_list = [6, 7, 8, 9, 0]
    else:
        atomic_num_list = [6, 7, 8, 9, 15, 16, 17, 35, 53, 0]
    mols, num_no_correct = [], 0
    for x_elem, adj_elem in zip(x, adj):
        mol = construct_mol(x_elem, adj_elem, atomic_num_list)
        cmol, no_correct = correct_mol(mol)
        if no_correct: num_no_correct += 1
        vcmol = valid_mol_can_with_seg(cmol, largest_connected_comp=largest_connected_comp)
        mols.append(vcmol)
    mols = [mol for mol in mols if mol is not None]
    return mols, num_no_correct

While num_mols_wo_correction (num_no_correct in the function) is the actual number of valid molecules before correction, it seems that num_mols is not the number of generated molecules. That is because of the way you build mols in the function, which becomes gen_mols in your evaluation procedure. mols seems to exclude the molecules that are not valid and can't be corrected, leading to a smaller total number of molecules in your validity computation. The metric you are actually computing is the number of valid molecules before correction over the final number of valid molecules.

If I'm correct, your function should rather output something like mols, num_no_correct, num_generated, where num_generated is len(x), and your validity should be num_no_correct / num_generated.

Is that a mistake from the MoFlow paper or is there something I haven't understood ?

Answer 1 · 2022-08-11T05:34:44.000Z

Hi, thank you for your interest in our work!
As reported in Table 10 and 11 in our paper and also in the MoFlow paper,
the validity with the correction procedure is 100%, so there is no difference in practice.
You can of course replace the denominator with x.shape[0].