churchlab/UniRep

Diversity in evotuning sequence data

wjs20 opened this issue · 1 comments

wjs20 commented

Hi

I am attempting to apply the approach outlined in your paper with antibody sequences. In the evotuning step I used ~60,000 paired human and mouse VH-VL sequences from the OAS database and then trained a ridge regression top model as you suggested on two small public antibody affinity datasets (~40-80 sequences). I'm getting small correlation co-efficients (~30) between predicted and ground-truth values on the test set for the larger dataset, but there seems to be no reproducible correlation for the smaller of the two.

I was wondering if changing the evotuning regime might improve results? do you think including a more diverse set of sequences help? i.e. antibodies from more individuals, species, including T-cell receptors?

How applicable do you think this kind of transfer learning approach is to antibodies relative to other protein applications?

Thanks

Hey @wjs20, the issues page should be used for resolving code confusions and bugs. This question is interesting! But too open ended to answer here. Happy to chat offline: surge [at] nabla [dot] bio