How to just get the code embedding of entire code_snippet, without having to predict any function name
TejaswiniiB opened this issue · 2 comments
Hi,
I want to get code embedding of entire code_snippet. How to get it?
As per Code Snippet embedding
section in interactive_prediction.ipynb
, it gives the embedding of only masked method. I don't have any masked method or func name to predict. I just want the embedding of entire code. How can we do it?
Hi,
thanks for your interest in the Code Transformer
.
What is your input? If it is not a code snippet containing a single method, our trained models most likely won't give meaningful embeddings. If it is a single method, you can just mask the method name to make it match our training setting.
Regarding the embeddings themselves:
encoder_output.all_emb[-1][1]
gives you the query stream embedding (1) in the last layer (-1) which should encapsulate the whole snippet.- You can also try using
encoder_output.all_emb[-1][1]
, which gives you the content stream embeddings (0) of the last layer (-1). These are defined per input token and averaging them over the sequence length might give you meaningful embeddings as well.
Hope this helps.
Best,
Tobias
Hi @tobias-kirschstein ,
Thank you for your reply.
My input is the code containing more than one method (multiple methods).
Thankyou!