churchlab/UniRep

self._top_final_hidden question

spark157 opened this issue · 2 comments

Generally what is the idea of the _top_final_hidden -ie. what is it supposed to generate?

In following the code:
self._top_final_hidden = tf.gather_nd(self._output, tf.stack([tf.range(tf_get_shape(self._output)[0], dtype=tf.int32), indices], axis=1))

I'm having bit of trouble following what it is actually returning. It seems to me to be the representation for the final amino acid in the sequence (which is a bit confusing to me why this is what is wanted).

Also, I can see that get_rep_ops is returning self._top_final_hidden sort of in place of average hidden (Noted as POSTPONED). Does this then mean that the notebook tutorial is not really using the avg_final as the paper does?

Thanks if you can shed some insight into this.

Scott

Ahh - I see this is described clearly in the paper:

We extracted final hidden state produced by the model when predicting the last amino acid in a protein sequence (Final Hidden)

The question of the tutorial still applies.

Thanks.

Scott

Hi Scott,

Thanks for your question. When you call get_rep it returns a tuple of average_hidden, final_hidden, and final_cell, which are all three categories of basic representations we tested in the paper. This is the function we used to produce avg_hidden, which was the most successful representation and used throughout the paper. If you are trying to produce average_hidden representations for sequences, please use this function.

We make it clear in the tutorial that get_rep_ops returns the final_hidden graph op, not the average hidden graph op. This is useful if you intend to jointly optimize a top model and the entire UniRep Network. Otherwise, you can produce static representations using get_rep and then train top models on them as if they were a feature vector. This is what we do throughout the paper. Including the final_hidden operation was to encourage future work on tuning the whole network to a specific task.

We don't have an average hidden graph op because it is not how UniRep was trained, is not needed to accomplish anything we show in the paper, and is non-trivial to implement (plus it is unclear to me whether it makes sense to propagate gradients skipping back through time as you would in an average along the time/position axis).