mims-harvard/TDC

The meaning of the score in the document 'toy_data/ppi.txt'

ke-ning opened this issue · 2 comments

Thank the authors for the excellent work.
I would like to ask, when training a PPI model and importing the 'ppi.txt' file from the 'toy_data' folder, each line in this file contains a score. If I want to use my own dataset to replace this ppi.txt file, how can I obtain scores for my pairs of amino acid sequences? Or could I directly import pairs of amino acids without the score for model training? Thank you.

Hi, the score could be a binding affinity. but if you don't have it, you can simply label then 1 if there is interaction and then sample lots negative pairs and label them 0

Thank you very much for your response. I still have two questions and hope to get your reply. Question one, in your response, you manually assigned values of 0 or 1 to the data. What should be the proportion of 0 and 1 in the data? Question two, if I skip the pre-training model step and directly download the model for use, where should I input my dataset? I did not find specific code tutorials on inputting my own dataset in the two tutorials. Looking forward to your reply.