How to change the code to work for multi-label classification?

Question

How to change the code to work for multi-label classification?

allomy opened this issue 3 years ago · 5 comments

I'm trying to use code2vec for multi-label classification, that one sample belongs to several labels, could you give some suggestions what to do with the model?

Thank you in advance for your help!

Answer 1 · 2021-11-15T19:55:01.000Z

Hi @allomy ,
Thank you for your interest in code2vec!

I think that you can loss here:
https://github.com/tech-srl/code2vec/blob/master/tensorflow_model.py#L228
from the standard cross entropy to sigmoid cross entropy: https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/sigmoid_cross_entropy_with_logits

But you will also need to change the pipeline to support reading multi-labeled examples. Follow the variable target_index here: https://github.com/tech-srl/code2vec/blob/master/path_context_reader.py
and modify it to get a list of targets for every example.

Best,
Uri

Answer 2 · 2021-11-17T02:17:27.000Z

Hi @urialon , thank you for your quick response. I'll try it soon.

Answer 3 · 2021-12-03T06:49:24.000Z

Hi @urialon , sorry for the delay response that I have tried to modify the code related to target_index, but was lost in the code... Could you give more information about modifying it to get a list of targets for every sample? Thank you in advance for your help.

Answer 4 · 2021-12-06T14:10:57.000Z

Hi @allomy ,
Actually it might be easiest for you to use https://code2seq.org/ .
It predicts a sequence of labels and not multi-label, but it may either be a good approximation, or easier to adapt for multi-label (just change the loss computation, not the entire data reading pipeline).

Best,
Uri

Answer 5 · 2021-12-07T02:04:51.000Z

Thank you @urialon , I will take a look at code2seq.