How to just get the code embedding of entire code_snippet, without having to predict any function name

Question

How to just get the code embedding of entire code_snippet, without having to predict any function name

TejaswiniiB opened this issue 2 years ago · 2 comments

Hi,
I want to get code embedding of entire code_snippet. How to get it?
As per Code Snippet embedding section in interactive_prediction.ipynb, it gives the embedding of only masked method. I don't have any masked method or func name to predict. I just want the embedding of entire code. How can we do it?

Answer 1 · 2022-08-14T15:00:02.000Z

Hi,

thanks for your interest in the Code Transformer.
What is your input? If it is not a code snippet containing a single method, our trained models most likely won't give meaningful embeddings. If it is a single method, you can just mask the method name to make it match our training setting.
Regarding the embeddings themselves:

encoder_output.all_emb[-1][1] gives you the query stream embedding (1) in the last layer (-1) which should encapsulate the whole snippet.
You can also try using encoder_output.all_emb[-1][1], which gives you the content stream embeddings (0) of the last layer (-1). These are defined per input token and averaging them over the sequence length might give you meaningful embeddings as well.

Hope this helps.

Best,
Tobias

Answer 2 · 2022-08-16T04:25:35.000Z

Hi @tobias-kirschstein ,
Thank you for your reply.
My input is the code containing more than one method (multiple methods).
Thankyou!