biolib/openprotein

soft_to_angle theory

TrentBrick opened this issue · 4 comments

Would it be possible to share a link to the research or reasoning behind the soft_to_angle Module for someone new to structural protein problems?

My current hunch is that you have run a mixture model on the pfam database and found the average angle conformations of the different families. You then use a LogSoftmax activation function to allow each amino acid to choose which of the omega, psi and phi angles it wants from this table of options. You then take these values and use sin, cos and arctan to convert them into angles?
Why does the mixture model have 500 clusters, how was the mixture_model table generated, and why is there a 90:10 pos/neg omega ratio that is then randomly shuffled in?

Again I am a noob so pointers to any papers or other grounded reasoning for this approach would be really appreciated.

Along similar lines in preprocessing.py you take the ProteinNet tertiary data which is in coordinate format, and then convert it into angles and then back to coordinates again. Why?

Starting from line 132
angles, batch_sizes = calculate_dihedral_angles_over_minibatch(pos, [len(prim)], use_gpu=use_gpu) tertiary, _ = get_backbone_positions_from_angular_prediction(angles, batch_sizes, use_gpu=use_gpu) tertiary = tertiary.squeeze(1)

Any further insight on this would be really appreciated!

For inspiration to model design, you're probably best off by reading https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6
In preprocessing.py we're currently converting to angles and back-again to ensure the distance between amino acids is exactly the ones we use the pnerf module. Going from coordinations -> angles -> coordinates should give back exactly the same coordinates. However, the original coordinates (measured) can contain some noise, so this is essentially a preprocessing step to remove it.

Thanks, I read this paper a while ago and didn't remember there being the right side of figure 2 with the "torsional alphabet", it may have been added in a later edition.

I still don't see any information about using a mixture model or in the RGN github repo any actual mixture model angles (you have three different files for these). Did you generate these yourself or correspond with AlQuraishi to get them?

And the preprocessing.py noise removal makes sense.