Paper Link: Optimizing non-decomposable measures with deep networks
For running DUPLE, DAME, DENIM and Struct-ANN
Go into the deep_non_decomp_src
folder to see the code.
I apologize in advance for this code being weirdly inconsistent in several ways. I have edited this code over a long period of time with significant breaks in between, which I blame for this inconsistency.
The following address are relative to the deep_non_decomp_src
folder.
All the data is in the datasets
folder and is read through the wrapper in datasets/dataRead.py
.
-
Ensure that the variable
dual_class
in Line 15 is set to one of the classes inDeeSpade.dual_step
. -
Ensure that the variable
model
in Line 22 is set toSpade
. -
and then run
python train_batch_opt.py [dataset]
-
The score is accumulated in line 72 and 73.
-
Use lines 96 and 97 to save it to file.
-
The variable
dual_class
is inconsquential. -
Ensure that the variable
model
in Line 22 is set toBenchANN
. -
and then run
python train_batch_opt.py [dataset]
- All the scores are accumulated in
minC
in Line 71. - Save them through 98.
-
You will have to do a trivial change in the BenchANN file to get rid of the p-sensitive cost function to get the true cost. To do this comment Line 40 in DeeSpade/bench.py and uncomment Line 42```
-
The variable
dual_class
is inconsquential. -
Ensure that the variable
model
in Line 22 is set toBenchANN
. -
and then run
python train_batch_opt.py [dataset]
- All the scores are accumulated in
minC
in Line 71. - Comment Line 72 and Line 73.
- Extract the different scores from Line 76 to 80.
- Save them through Line 99 - 102.
- Fbeta score is the only score we see here. The code for that is in
DAMP.ANNAMP/FbetaANN
. - Run
python ANNAMPTrain.py [dataset]
- The scores are stored in
[dataset]ANNAMAP_FMeas_new.npz
- The code is in
DAMP.AMP.FbetaThresh
. - Run
python AMPTrain.py [dataset]
# - The scores are stored in
[dataset]AMP_PG.npz
- Here we only look at NegKLD. The code is in
DAMP.AMP.FbetaThresh
and the primal and dual step are indemesis.concave_fn.KLD
. - Run
python train_denembis_kld.py [dataset]
- The score is stored in
[dataset]_kld_rew.npz
Some files also calculate BAKLD but they can be ignored
- The
MVC
code is present inall_struct/c_code/mvc.c
and the shared library is already compiled in the folder aslibmvc.so
. - This is then used by the network definition and training algorithm which is present in
all_struct/struct_ann.py
and the final training wrapper istrain_batch_struct.py
. - Ignoring the details, to train run the command
python train_batch_struct.py [dataset] [loss_fn]
where the[dataset]
variable is as usual and the variable[loss_fn]
is defined inall_struct/loss_functions.py```. We only use
minTPRTNRand
fone` among those.
- Run the necessary training files to obtain the score files.
- Then run the necessary plot file i.e one of
plot_[Fmeas, KLD, MinTPRTNR, QMean].py [x_axis_length]
The following addresses are relative to the seq2seq-attn
folder.
th train1.lua -data_file data/twit/twit-train.hdf5 -val_data_file data/twit/twit-val.hdf5 -savefile twit-model
th evaluate1.lua -model twit-model_final.t7 -src_file data/twit/src-val.txt -output_file pred.txt -src_dict data/twit/twit.src.dict -targ_dict data/twit/twit.targ.dict
If you use this code please cite the paper
@Article{Sansyal2018,
author="Sanyal, Amartya
and Kumar, Pawan
and Kar, Purushottam
and Chawla, Sanjay
and Sebastiani, Fabrizio",
title="Optimizing non-decomposable measures with deep networks",
journal="Machine Learning",
year="2018",
month="Sep",
day="01",
volume="107",
number="8",
pages="1597--1620",
doi="10.1007/s10994-018-5736-y",
}