
Request help for exporting the vectors for the method code

Sichengluis opened this issue · 4 comments

Hi Sir,
Thank you for your great work on code2vec, it really helped me a lot!

I am trying to create an AI model to predict whether a method has code smell, it is a binary classification problem. Since all my data is written in Java, I have used code2vec to represent the method code as a vector. I have used your model directly in two different ways and I have the following questions.

  1. The first way is to export the embedding of each token directly using your model. I am now using your trained model directly, and exported the embedding (tokens.txt) of each token through it (python3 --load models/java14_model/saved_model_iter8.release --save_w2v models/java14_model/tokens.txt) I applied the file as a vocabulary to represent each token of my own method (non-existent tokens are treated as PAD_OR_OOV).

  2. The second way I tried was to use your model directly to derive the vectors corresponding to each method. I am now trying to export the vectors corresponding to each of my methods using your model, I used the following command --load models/java14_model/saved_model_iter8.release --export_code_vectors --test methods_ oneline.txt, to do it. There are thousands of lines in methods_oneline.txt, each line is a method, but I always get the following error.

I know it's better to train a model from scratch with my own data, but I'd like to use your model directly. Do you have any suggestions for me? I'm new in AI, how can I do better in the case of using your model directly?

Thank you and your team in advance and sorry if my question was not clearly expressed or too native.

Hi @Sichengluis ,
Thank you for your interest in our work!

I think that the only issue is that the file methods_oneline.txt needs to be a file that was preprocessed using our pipeline. It seems that it might have passed through the JavaExtractor step here: ,

but the output might not have passed through this step ?

Let me know if I misunderstood your question.

Hi @urialon ,

Thank u for your reply!

It is a little bit hard for me to use it in the second way. I wonder if the first way I mentioned above is feasible. Can I use the vocabulary to get the embedding of each word directly like using word2vec?


Yes. But you don't need to export anything, you can just download the vocabularies. See:


Great! Thank you again!