danielzuegner/code-transformer

How to just get the code embedding of entire code_snippet, without having to predict any function name

TejaswiniiB opened this issue · 2 comments

Hi,
I want to get code embedding of entire code_snippet. How to get it?
As per Code Snippet embedding section in interactive_prediction.ipynb, it gives the embedding of only masked method. I don't have any masked method or func name to predict. I just want the embedding of entire code. How can we do it?

Hi,

thanks for your interest in the Code Transformer.
What is your input? If it is not a code snippet containing a single method, our trained models most likely won't give meaningful embeddings. If it is a single method, you can just mask the method name to make it match our training setting.
Regarding the embeddings themselves:

  • encoder_output.all_emb[-1][1] gives you the query stream embedding (1) in the last layer (-1) which should encapsulate the whole snippet.
  • You can also try using encoder_output.all_emb[-1][1], which gives you the content stream embeddings (0) of the last layer (-1). These are defined per input token and averaging them over the sequence length might give you meaningful embeddings as well.

Hope this helps.

Best,
Tobias

Hi @tobias-kirschstein ,
Thank you for your reply.
My input is the code containing more than one method (multiple methods).
Thankyou!