How to change the code to work for multi-label classification?
allomy opened this issue · 5 comments
I'm trying to use code2vec for multi-label classification, that one sample belongs to several labels, could you give some suggestions what to do with the model?
Thank you in advance for your help!
Hi @allomy ,
Thank you for your interest in code2vec!
I think that you can loss here:
https://github.com/tech-srl/code2vec/blob/master/tensorflow_model.py#L228
from the standard cross entropy to sigmoid cross entropy: https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/sigmoid_cross_entropy_with_logits
But you will also need to change the pipeline to support reading multi-labeled examples. Follow the variable target_index
here: https://github.com/tech-srl/code2vec/blob/master/path_context_reader.py
and modify it to get a list of targets for every example.
Best,
Uri
Hi @urialon , sorry for the delay response that I have tried to modify the code related to target_index
, but was lost in the code... Could you give more information about modifying it to get a list of targets for every sample? Thank you in advance for your help.
Hi @allomy ,
Actually it might be easiest for you to use https://code2seq.org/ .
It predicts a sequence of labels and not multi-label, but it may either be a good approximation, or easier to adapt for multi-label (just change the loss computation, not the entire data reading pipeline).
Best,
Uri