churchlab/UniRep

Is "evotuning" supervised or unsupervised?

Closed this issue · 1 comments

dlnp2 commented

Thank you for your great work. I want to clearly understand "evotuning".

In the section "Generalization through accurate approximation of the fitness landscape" in the paper, you mentioned evotuning as

unsupervised weight updates of the UniRep mLSTM (Evotuned UniRep) performing the same next-character prediction task

In addition, in the README of this repo, you documented as

evotuned/unirep/: the weights, as a tensorflow checkpoint file, after 13k unsupervised weight updates on fluorescent protein homologs obtained with JackHMMer of the globally pre-trained UniRep (1900-unit model)

Thus evotuning seems to be unsupervised.

In Methods in the paper however, it is referred to as

Model fine-tuning with extant GFP sequences. We loaded the weights learned by UniRep in the exact same architecture as before, but replacing the final layer, which previously predicted the next character, with a randomly initialized feed-forward layer with a single output and no nonlinearity

And in unirep_tutorial.ipynb, the model is composed of

the top model and the mLSTM

which seems to be consistent with the description in Methods. This is supervised, since trained with a dummy target value '42' as the ground truth.

So, how should we understand evotuning? Can we evotune by just using the codes written in unirep_tutorial.pynb? If this is true, was 42 your learning target in the paper?

@dlnp2 Evotuning is unsupervised, but we don't have a tutorial available as the process at the time of writing the paper was heavily dependent on cluster-specific resources in the computing environment. Since then, we haven't worked on making a general purpose tutorial available as members of the community have come up with an excellent reimplementation in JAX here: https://github.com/ElArkk/jax-unirep . Hope that helps.