muntakimrafi/enforemer_embeddings

Repository containing code to extract the embeddings of the Enformer model.

Jupyter Notebook

Instructions to extract Enformer's embeddings

Install Python 3.8 and the requirements.
Clone the Enformer repository to ./enformer/.
Download the pretrained weigths and store them to ./weights/.
Add to the following function to the Enformer class in the ./enformer/enformer.py module:

### ADDITIONAL FUNCTION TO EXTRACT EMBEDDINGS ###
  @tf.function(input_signature=[
      tf.TensorSpec([None, SEQUENCE_LENGTH, 4], tf.float32)])
  def extract_features(self, x):
    return self.trunk(x, is_training=False)

Set the correct SEQUENCE_LENGTH = variable (196_608 or 393_216, depending on the version of the weights).
In the embed_input_sequences.ipynb notebook, set the paths to the sequences and set the functions that read the sequences (TODOs).

Additional testing:

Download the testing data and run the examples present in the notebook.