biolib/openprotein

Error in drmsd computation

ecvgit opened this issue · 1 comments

I think the drmsd computation in the function calc_avg_drmsd_over_minibatch is wrong.

https://github.com/OpenProtein/openprotein/blob/e4e2e0c8597f1f113b7074d0e6b223f8d019138e/util.py#L267

Here actual_coords_list[idx] is a tensor of size [seq_len, 9].
The 9 coordinates are the x,y,z corresponding to C', C-alpha, N
You want to convert it into a tensor of size [seq_len*3, 3]

But the current code does not convert the coordinates correctly.

>>> torch.tensor([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18]]).transpose(0,1).contiguous().view(-1,3) 
tensor([[  1,  10,   2],
        [ 11,   3,  12],
        [  4,  13,   5],
        [ 14,   6,  15],
        [  7,  16,   8],
        [ 17,   9,  18]])

You can see the coordinates are mangled.

I think the correct code should be

actual_coords = actual_coords_list[idx].view(-1, 3)

>>> torch.tensor([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18]]).view(-1,3) 
tensor([[  1,   2,   3],
        [  4,   5,   6],
        [  7,   8,   9],
        [ 10,  11,  12],
        [ 13,  14,  15],
        [ 16,  17,  18]])
                                                                                      

Hey @ecvgit , thanks for flagging this. If this is the case, it is presumably also a problem here https://github.com/OpenProtein/openprotein/blob/e4e2e0c8597f1f113b7074d0e6b223f8d019138e/openprotein.py#L84 ? I've set up some proper unit tests soon so things like this can be verified.