tech-srl/code2vec

How to change the code to work for multi-label classification?

allomy opened this issue · 5 comments

I'm trying to use code2vec for multi-label classification, that one sample belongs to several labels, could you give some suggestions what to do with the model?

Thank you in advance for your help!

Hi @allomy ,
Thank you for your interest in code2vec!

I think that you can loss here:
https://github.com/tech-srl/code2vec/blob/master/tensorflow_model.py#L228
from the standard cross entropy to sigmoid cross entropy: https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/sigmoid_cross_entropy_with_logits

But you will also need to change the pipeline to support reading multi-labeled examples. Follow the variable target_index here: https://github.com/tech-srl/code2vec/blob/master/path_context_reader.py
and modify it to get a list of targets for every example.

Best,
Uri

Hi @urialon , thank you for your quick response. I'll try it soon.

Hi @urialon , sorry for the delay response that I have tried to modify the code related to target_index, but was lost in the code... Could you give more information about modifying it to get a list of targets for every sample? Thank you in advance for your help.

Hi @allomy ,
Actually it might be easiest for you to use https://code2seq.org/ .
It predicts a sequence of labels and not multi-label, but it may either be a good approximation, or easier to adapt for multi-label (just change the loss computation, not the entire data reading pipeline).

Best,
Uri

Thank you @urialon , I will take a look at code2seq.