Unable to recreate the results for UpperBound Experiments

Question

Unable to recreate the results for UpperBound Experiments

Closed this issue 4 months ago · 14 comments

Hey, thanks for the code and the QA dataset.

I am trying to reproduce the results in the paper for upperbound experiments, using the PTB-XL dataset, following every step as mentioned in the README, but I am getting poorer results as compared to the reported results. Is there anything that I am missing?
For SE-WRN model - I am getting Macro Averaged AUC of 0.824.

Thanks

Answer 1 · 2024-07-31T11:21:54.000Z

Hi,

Could you let me know how you calculated the macro-AUROC?
Did you get AUROCs for each of 83 attributes and macro-average them?

Answer 2 · 2024-07-31T11:30:13.000Z

It logs macro AUROC doesn't it?
I checked the code, it averages across all of the 83 attributes.

criterion:
_name: multi_head_binary_cross_entropy
report_auc: true
log_per_class: true
per_log_keys: [attribute_id]

Example:
"test_auroc": "0.820812",

[2024-07-21 20:48:16,450][test][INFO] - {"epoch": 22, "test_loss": "1.091", "test_nsignals": "125.077", "test_accuracy": "0.77429", "test_cls_73_accuracy": "0.83333", "test_cls_11_accuracy": "0.66538", "test_cls_61_accuracy": "0.83153", "test_cls_52_accuracy": "0.75223", "test_cls_53_accuracy": "0.78659", "test_cls_30_accuracy": "0.88333", "test_cls_81_accuracy": "0.9", "test_cls_24_accuracy": "0.98333", "test_cls_37_accuracy": "0.88084", "test_cls_63_accuracy": "0.9", "test_cls_29_accuracy": "0.83333", "test_cls_20_accuracy": "0.86667", "test_cls_67_accuracy": "0.96667", "test_cls_51_accuracy": "0.81667", "test_cls_69_accuracy": "0.69744", "test_cls_55_accuracy": "0.74854", "test_cls_3_accuracy": "0.78333", "test_cls_14_accuracy": "0.94444", "test_cls_21_accuracy": "0.96667", "test_cls_16_accuracy": "0.79615", "test_cls_35_accuracy": "0.80392", "test_cls_6_accuracy": "0.98333", "test_cls_46_accuracy": "0.72222", "test_cls_74_accuracy": "0.8", "test_cls_13_accuracy": "0.91667", "test_cls_76_accuracy": "0.78333", "test_cls_25_accuracy": "0.9", "test_cls_23_accuracy": "0.70142", "test_cls_45_accuracy": "0.71698", "test_cls_18_accuracy": "0.88333", "test_cls_26_accuracy": "0.81667", "test_cls_68_accuracy": "0.76667", "test_cls_38_accuracy": "0.70846", "test_cls_49_accuracy": "0.71833", "test_cls_12_accuracy": "0.56667", "test_cls_22_accuracy": "0.78", "test_cls_2_accuracy": "0.81667", "test_cls_19_accuracy": "0.83333", "test_cls_31_accuracy": "0.89474", "test_cls_36_accuracy": "0.88333", "test_cls_57_accuracy": "1", "test_cls_27_accuracy": "0.82456", "test_cls_58_accuracy": "0.88333", "test_cls_10_accuracy": "0.94", "test_cls_79_accuracy": "0.78333", "test_cls_42_accuracy": "0.9", "test_cls_48_accuracy": "0.76429", "test_cls_78_accuracy": "0.76667", "test_cls_4_accuracy": "0.73333", "test_cls_65_accuracy": "0.88333", "test_cls_33_accuracy": "0.82759", "test_cls_70_accuracy": "0.71533", "test_cls_56_accuracy": "0.71667", "test_attribute_id_73_accuracy": "0.83333", "test_attribute_id_11_accuracy": "0.66538", "test_attribute_id_61_accuracy": "0.83153", "test_attribute_id_52_accuracy": "0.75223", "test_attribute_id_53_accuracy": "0.78659", "test_attribute_id_30_accuracy": "0.88333", "test_attribute_id_81_accuracy": "0.9", "test_attribute_id_24_accuracy": "0.98333", "test_attribute_id_37_accuracy": "0.88084", "test_attribute_id_63_accuracy": "0.9", "test_attribute_id_29_accuracy": "0.83333", "test_attribute_id_20_accuracy": "0.86667", "test_attribute_id_67_accuracy": "0.96667", "test_attribute_id_51_accuracy": "0.81667", "test_attribute_id_69_accuracy": "0.69744", "test_attribute_id_55_accuracy": "0.74854", "test_attribute_id_3_accuracy": "0.78333", "test_attribute_id_14_accuracy": "0.94444", "test_attribute_id_21_accuracy": "0.96667", "test_attribute_id_16_accuracy": "0.79615", "test_attribute_id_35_accuracy": "0.80392", "test_attribute_id_6_accuracy": "0.98333", "test_attribute_id_46_accuracy": "0.72222", "test_attribute_id_74_accuracy": "0.8", "test_attribute_id_13_accuracy": "0.91667", "test_attribute_id_76_accuracy": "0.78333", "test_attribute_id_25_accuracy": "0.9", "test_attribute_id_23_accuracy": "0.70142", "test_attribute_id_45_accuracy": "0.71698", "test_attribute_id_18_accuracy": "0.88333", "test_attribute_id_26_accuracy": "0.81667", "test_attribute_id_68_accuracy": "0.76667", "test_attribute_id_38_accuracy": "0.70846", "test_attribute_id_49_accuracy": "0.71833", "test_attribute_id_12_accuracy": "0.56667", "test_attribute_id_22_accuracy": "0.78", "test_attribute_id_2_accuracy": "0.81667", "test_attribute_id_19_accuracy": "0.83333", "test_attribute_id_31_accuracy": "0.89474", "test_attribute_id_36_accuracy": "0.88333", "test_attribute_id_57_accuracy": "1", "test_attribute_id_27_accuracy": "0.82456", "test_attribute_id_58_accuracy": "0.88333", "test_attribute_id_10_accuracy": "0.94", "test_attribute_id_79_accuracy": "0.78333", "test_attribute_id_42_accuracy": "0.9", "test_attribute_id_48_accuracy": "0.76429", "test_attribute_id_78_accuracy": "0.76667", "test_attribute_id_4_accuracy": "0.73333", "test_attribute_id_65_accuracy": "0.88333", "test_attribute_id_33_accuracy": "0.82759", "test_attribute_id_70_accuracy": "0.71533", "test_attribute_id_56_accuracy": "0.71667", "test_cls_8_accuracy": "0.63333", "test_cls_7_accuracy": "0.83333", "test_cls_17_accuracy": "0.81667", "test_cls_64_accuracy": "0.7", "test_cls_54_accuracy": "0.83333", "test_cls_43_accuracy": "0.78333", "test_cls_80_accuracy": "0.78333", "test_cls_1_accuracy": "0.8", "test_cls_34_accuracy": "0.81667", "test_cls_15_accuracy": "0.88333", "test_cls_47_accuracy": "0.66667", "test_cls_40_accuracy": "0.7", "test_cls_66_accuracy": "0.73333", "test_cls_32_accuracy": "0.76667", "test_cls_59_accuracy": "0.81633", "test_attribute_id_8_accuracy": "0.63333", "test_attribute_id_7_accuracy": "0.83333", "test_attribute_id_17_accuracy": "0.81667", "test_attribute_id_64_accuracy": "0.7", "test_attribute_id_54_accuracy": "0.83333", "test_attribute_id_43_accuracy": "0.78333", "test_attribute_id_80_accuracy": "0.78333", "test_attribute_id_1_accuracy": "0.8", "test_attribute_id_34_accuracy": "0.81667", "test_attribute_id_15_accuracy": "0.88333", "test_attribute_id_47_accuracy": "0.66667", "test_attribute_id_40_accuracy": "0.7", "test_attribute_id_66_accuracy": "0.73333", "test_attribute_id_32_accuracy": "0.76667", "test_attribute_id_59_accuracy": "0.81633", "test_cls_62_accuracy": "0.82143", "test_cls_39_accuracy": "0.91667", "test_cls_77_accuracy": "0.71667", "test_cls_5_accuracy": "0.85", "test_cls_82_accuracy": "0.91667", "test_cls_28_accuracy": "0.77966", "test_cls_50_accuracy": "0.73333", "test_cls_0_accuracy": "0.8", "test_cls_60_accuracy": "0.83333", "test_cls_9_accuracy": "0.75", "test_attribute_id_62_accuracy": "0.82143", "test_attribute_id_39_accuracy": "0.91667", "test_attribute_id_77_accuracy": "0.71667", "test_attribute_id_5_accuracy": "0.85", "test_attribute_id_82_accuracy": "0.91667", "test_attribute_id_28_accuracy": "0.77966", "test_attribute_id_50_accuracy": "0.73333", "test_attribute_id_0_accuracy": "0.8", "test_attribute_id_60_accuracy": "0.83333", "test_attribute_id_9_accuracy": "0.75", "test_cls_75_accuracy": "1", "test_cls_71_accuracy": "0.8", "test_attribute_id_75_accuracy": "1", "test_attribute_id_71_accuracy": "0.8", "test_cls_72_accuracy": "0.88889", "test_attribute_id_72_accuracy": "0.88889", "test_cls_41_accuracy": "0.86667", "test_cls_44_accuracy": "0.78431", "test_attribute_id_41_accuracy": "0.86667", "test_attribute_id_44_accuracy": "0.78431", "test_num_updates": "21428", "test_best_accuracy": "0.78025", ### "test_auroc": "0.820812", "test_auprc": "0.683473", "test_cls_2_auroc": "0.8825", "test_cls_2_auprc": "0.778447", "test_cls_3_auroc": "0.93", "test_cls_3_auprc": "0.854287", "test_cls_4_auroc": "0.87", "test_cls_4_auprc": "0.781069", "test_cls_6_auroc": "1", "test_cls_6_auprc": "1", "test_cls_10_auroc": "0.9475", "test_cls_10_auprc": "0.825548", "test_cls_11_auroc": "0.656934", "test_cls_11_auprc": "0.497199", "test_cls_12_auroc": "0.61375", "test_cls_12_auprc": "0.390274", "test_cls_13_auroc": "0.9525", "test_cls_13_auprc": "0.938802", "test_cls_14_auroc": "0.966667", "test_cls_14_auprc": "0.881944", "test_cls_16_auroc": "0.865584", "test_cls_16_auprc": "0.761077", "test_cls_18_auroc": "0.98625", "test_cls_18_auprc": "0.97462", "test_cls_19_auroc": "0.92375", "test_cls_19_auprc": "0.860601", "test_cls_20_auroc": "0.91375", "test_cls_20_auprc": "0.860014", "test_cls_21_auroc": "0.99125", "test_cls_21_auprc": "0.98031", "test_cls_22_auroc": "0.8725", "test_cls_22_auprc": "0.663597", "test_cls_23_auroc": "0.674035", "test_cls_23_auprc": "0.529741", "test_cls_24_auroc": "1", "test_cls_24_auprc": "1", "test_cls_25_auroc": "1", "test_cls_25_auprc": "1", "test_cls_26_auroc": "0.8825", "test_cls_26_auprc": "0.832486", "test_cls_27_auroc": "0.975", "test_cls_27_auprc": "0.950102", "test_cls_29_auroc": "0.71405", "test_cls_29_auprc": "0.305185", "test_cls_30_auroc": "0.9175", "test_cls_30_auprc": "0.89171", "test_cls_31_auroc": "0.977941", "test_cls_31_auprc": "0.954191", "test_cls_33_auroc": "0.77913", "test_cls_33_auprc": "0.496385", "test_cls_35_auroc": "0.979545", "test_cls_35_auprc": "0.937247", "test_cls_36_auroc": "0.98", "test_cls_36_auprc": "0.966176", "test_cls_37_auroc": "0.943097", "test_cls_37_auprc": "0.869614", "test_cls_38_auroc": "0.836832", "test_cls_38_auprc": "0.632153", "test_cls_42_auroc": "0.9975", "test_cls_42_auprc": "0.995455", "test_cls_45_auroc": "0.810007", "test_cls_45_auprc": "0.614562", "test_cls_46_auroc": "0.764132", "test_cls_46_auprc": "0.634483", "test_cls_48_auroc": "0.855199", "test_cls_48_auprc": "0.729519", "test_cls_49_auroc": "0.808573", "test_cls_49_auprc": "0.634935", "test_cls_51_auroc": "0.93", "test_cls_51_auprc": "0.885417", "test_cls_52_auroc": "0.645195", "test_cls_52_auprc": "0.314701", "test_cls_53_auroc": "0.858363", "test_cls_53_auprc": "0.723025", "test_cls_55_auroc": "0.811172", "test_cls_55_auprc": "0.628535", "test_cls_56_auroc": "0.86", "test_cls_56_auprc": "0.789953", "test_cls_57_auroc": "1", "test_cls_57_auprc": "1", "test_cls_58_auroc": "0.9425", "test_cls_58_auprc": "0.910785", "test_cls_61_auroc": "0.900636", "test_cls_61_auprc": "0.768744", "test_cls_63_auroc": "0.93625", "test_cls_63_auprc": "0.922303", "test_cls_65_auroc": "0.9375", "test_cls_65_auprc": "0.832466", "test_cls_67_auroc": "0.98375", "test_cls_67_auprc": "0.971739", "test_cls_68_auroc": "0.84", "test_cls_68_auprc": "0.634047", "test_cls_69_auroc": "0.708284", "test_cls_69_auprc": "0.536556", "test_cls_70_auroc": "0.911779", "test_cls_70_auprc": "0.773581", "test_cls_73_auroc": "0.60896", "test_cls_73_auprc": "0.214599", "test_cls_74_auroc": "0.785", "test_cls_74_auprc": "0.763056", "test_cls_76_auroc": "0.862222", "test_cls_76_auprc": "0.782989", "test_cls_78_auroc": "0.845", "test_cls_78_auprc": "0.745796", "test_cls_79_auroc": "0.825", "test_cls_79_auprc": "0.721839", "test_cls_81_auroc": "0.905", "test_cls_81_auprc": "0.917668", "test_attribute_id_2_auroc": "0.8825", "test_attribute_id_2_auprc": "0.778447", "test_attribute_id_3_auroc": "0.93", "test_attribute_id_3_auprc": "0.854287", "test_attribute_id_4_auroc": "0.87", "test_attribute_id_4_auprc": "0.781069", "test_attribute_id_6_auroc": "1", "test_attribute_id_6_auprc": "1", "test_attribute_id_10_auroc": "0.9475", "test_attribute_id_10_auprc": "0.825548", "test_attribute_id_11_auroc": "0.656934", "test_attribute_id_11_auprc": "0.497199", "test_attribute_id_12_auroc": "0.61375", "test_attribute_id_12_auprc": "0.390274", "test_attribute_id_13_auroc": "0.9525", "test_attribute_id_13_auprc": "0.938802", "test_attribute_id_14_auroc": "0.966667", "test_attribute_id_14_auprc": "0.881944", "test_attribute_id_16_auroc": "0.865584", "test_attribute_id_16_auprc": "0.761077", "test_attribute_id_18_auroc": "0.98625", "test_attribute_id_18_auprc": "0.97462", "test_attribute_id_19_auroc": "0.92375", "test_attribute_id_19_auprc": "0.860601", "test_attribute_id_20_auroc": "0.91375", "test_attribute_id_20_auprc": "0.860014", "test_attribute_id_21_auroc": "0.99125", "test_attribute_id_21_auprc": "0.98031", "test_attribute_id_22_auroc": "0.8725", "test_attribute_id_22_auprc": "0.663597", "test_attribute_id_23_auroc": "0.674035", "test_attribute_id_23_auprc": "0.529741", "test_attribute_id_24_auroc": "1", "test_attribute_id_24_auprc": "1", "test_attribute_id_25_auroc": "1", "test_attribute_id_25_auprc": "1", "test_attribute_id_26_auroc": "0.8825", "test_attribute_id_26_auprc": "0.832486", "test_attribute_id_27_auroc": "0.975", "test_attribute_id_27_auprc": "0.950102", "test_attribute_id_29_auroc": "0.71405", "test_attribute_id_29_auprc": "0.305185", "test_attribute_id_30_auroc": "0.9175", "test_attribute_id_30_auprc": "0.89171", "test_attribute_id_31_auroc": "0.977941", "test_attribute_id_31_auprc": "0.954191", "test_attribute_id_33_auroc": "0.77913", "test_attribute_id_33_auprc": "0.496385", "test_attribute_id_35_auroc": "0.979545", "test_attribute_id_35_auprc": "0.937247", "test_attribute_id_36_auroc": "0.98", "test_attribute_id_36_auprc": "0.966176", "test_attribute_id_37_auroc": "0.943097", "test_attribute_id_37_auprc": "0.869614", "test_attribute_id_38_auroc": "0.836832", "test_attribute_id_38_auprc": "0.632153", "test_attribute_id_42_auroc": "0.9975", "test_attribute_id_42_auprc": "0.995455", "test_attribute_id_45_auroc": "0.810007", "test_attribute_id_45_auprc": "0.614562", "test_attribute_id_46_auroc": "0.764132", "test_attribute_id_46_auprc": "0.634483", "test_attribute_id_48_auroc": "0.855199", "test_attribute_id_48_auprc": "0.729519", "test_attribute_id_49_auroc": "0.808573", "test_attribute_id_49_auprc": "0.634935", "test_attribute_id_51_auroc": "0.93", "test_attribute_id_51_auprc": "0.885417", "test_attribute_id_52_auroc": "0.645195", "test_attribute_id_52_auprc": "0.314701", "test_attribute_id_53_auroc": "0.858363", "test_attribute_id_53_auprc": "0.723025", "test_attribute_id_55_auroc": "0.811172", "test_attribute_id_55_auprc": "0.628535", "test_attribute_id_56_auroc": "0.86", "test_attribute_id_56_auprc": "0.789953", "test_attribute_id_57_auroc": "1", "test_attribute_id_57_auprc": "1", "test_attribute_id_58_auroc": "0.9425", "test_attribute_id_58_auprc": "0.910785", "test_attribute_id_61_auroc": "0.900636", "test_attribute_id_61_auprc": "0.768744", "test_attribute_id_63_auroc": "0.93625", "test_attribute_id_63_auprc": "0.922303", "test_attribute_id_65_auroc": "0.9375", "test_attribute_id_65_auprc": "0.832466", "test_attribute_id_67_auroc": "0.98375", "test_attribute_id_67_auprc": "0.971739", "test_attribute_id_68_auroc": "0.84", "test_attribute_id_68_auprc": "0.634047", "test_attribute_id_69_auroc": "0.708284", "test_attribute_id_69_auprc": "0.536556", "test_attribute_id_70_auroc": "0.911779", "test_attribute_id_70_auprc": "0.773581", "test_attribute_id_73_auroc": "0.60896", "test_attribute_id_73_auprc": "0.214599", "test_attribute_id_74_auroc": "0.785", "test_attribute_id_74_auprc": "0.763056", "test_attribute_id_76_auroc": "0.862222", "test_attribute_id_76_auprc": "0.782989", "test_attribute_id_78_auroc": "0.845", "test_attribute_id_78_auprc": "0.745796", "test_attribute_id_79_auroc": "0.825", "test_attribute_id_79_auprc": "0.721839", "test_attribute_id_81_auroc": "0.905", "test_attribute_id_81_auprc": "0.917668", "test_cls_1_auroc": "0.81875", "test_cls_1_auprc": "0.763658", "test_cls_7_auroc": "0.88125", "test_cls_7_auprc": "0.752982", "test_cls_8_auroc": "0.6775", "test_cls_8_auprc": "0.426755", "test_cls_15_auroc": "0.9775", "test_cls_15_auprc": "0.949399", "test_cls_17_auroc": "0.855", "test_cls_17_auprc": "0.833072", "test_cls_32_auroc": "0.87625", "test_cls_32_auprc": "0.804281", "test_cls_34_auroc": "0.89", "test_cls_34_auprc": "0.854619", "test_cls_40_auroc": "0.75375", "test_cls_40_auprc": "0.657261", "test_cls_43_auroc": "0.955", "test_cls_43_auprc": "0.92042", "test_cls_47_auroc": "0.72375", "test_cls_47_auprc": "0.570795", "test_cls_54_auroc": "0.7875", "test_cls_54_auprc": "0.65625", "test_cls_59_auroc": "0.625", "test_cls_59_auprc": "0.36227", "test_cls_64_auroc": "0.65125", "test_cls_64_auprc": "0.577074", "test_cls_66_auroc": "0.7775", "test_cls_66_auprc": "0.632891", "test_cls_80_auroc": "0.745", "test_cls_80_auprc": "0.639197", "test_attribute_id_1_auroc": "0.81875", "test_attribute_id_1_auprc": "0.763658", "test_attribute_id_7_auroc": "0.88125", "test_attribute_id_7_auprc": "0.752982", "test_attribute_id_8_auroc": "0.6775", "test_attribute_id_8_auprc": "0.426755", "test_attribute_id_15_auroc": "0.9775", "test_attribute_id_15_auprc": "0.949399", "test_attribute_id_17_auroc": "0.855", "test_attribute_id_17_auprc": "0.833072", "test_attribute_id_32_auroc": "0.87625", "test_attribute_id_32_auprc": "0.804281", "test_attribute_id_34_auroc": "0.89", "test_attribute_id_34_auprc": "0.854619", "test_attribute_id_40_auroc": "0.75375", "test_attribute_id_40_auprc": "0.657261", "test_attribute_id_43_auroc": "0.955", "test_attribute_id_43_auprc": "0.92042", "test_attribute_id_47_auroc": "0.72375", "test_attribute_id_47_auprc": "0.570795", "test_attribute_id_54_auroc": "0.7875", "test_attribute_id_54_auprc": "0.65625", "test_attribute_id_59_auroc": "0.625", "test_attribute_id_59_auprc": "0.36227", "test_attribute_id_64_auroc": "0.65125", "test_attribute_id_64_auprc": "0.577074", "test_attribute_id_66_auroc": "0.7775", "test_attribute_id_66_auprc": "0.632891", "test_attribute_id_80_auroc": "0.745", "test_attribute_id_80_auprc": "0.639197", "test_cls_0_auroc": "0.7525", "test_cls_0_auprc": "0.698853", "test_cls_5_auroc": "0.9", "test_cls_5_auprc": "0.858755", "test_cls_9_auroc": "0.8225", "test_cls_9_auprc": "0.75473", "test_cls_28_auroc": "0.855263", "test_cls_28_auprc": "0.753322", "test_cls_39_auroc": "0.9725", "test_cls_39_auprc": "0.932268", "test_cls_50_auroc": "0.71125", "test_cls_50_auprc": "0.582936", "test_cls_60_auroc": "0.92625", "test_cls_60_auprc": "0.875121", "test_cls_62_auroc": "0.973437", "test_cls_62_auprc": "0.935572", "test_cls_77_auroc": "0.69625", "test_cls_77_auprc": "0.593529", "test_cls_82_auroc": "0.94625", "test_cls_82_auprc": "0.834483", "test_attribute_id_0_auroc": "0.7525", "test_attribute_id_0_auprc": "0.698853", "test_attribute_id_5_auroc": "0.9", "test_attribute_id_5_auprc": "0.858755", "test_attribute_id_9_auroc": "0.8225", "test_attribute_id_9_auprc": "0.75473", "test_attribute_id_28_auroc": "0.855263", "test_attribute_id_28_auprc": "0.753322", "test_attribute_id_39_auroc": "0.9725", "test_attribute_id_39_auprc": "0.932268", "test_attribute_id_50_auroc": "0.71125", "test_attribute_id_50_auprc": "0.582936", "test_attribute_id_60_auroc": "0.92625", "test_attribute_id_60_auprc": "0.875121", "test_attribute_id_62_auroc": "0.973437", "test_attribute_id_62_auprc": "0.935572", "test_attribute_id_77_auroc": "0.69625", "test_attribute_id_77_auprc": "0.593529", "test_attribute_id_82_auroc": "0.94625", "test_attribute_id_82_auprc": "0.834483", "test_cls_71_auroc": "0.80375", "test_cls_71_auprc": "0.755221", "test_cls_75_auroc": "1", "test_cls_75_auprc": "1", "test_attribute_id_71_auroc": "0.80375", "test_attribute_id_71_auprc": "0.755221", "test_attribute_id_75_auroc": "1", "test_attribute_id_75_auprc": "1", "test_cls_72_auroc": "0.977778", "test_cls_72_auprc": "0.916667", "test_attribute_id_72_auroc": "0.977778", "test_attribute_id_72_auprc": "0.916667", "test_cls_41_auroc": "0.93375", "test_cls_41_auprc": "0.835404", "test_cls_44_auroc": "0.863636", "test_cls_44_auprc": "0.662103", "test_attribute_id_41_auroc": "0.93375", "test_attribute_id_41_auprc": "0.835404", "test_attribute_id_44_auroc": "0.863636", "test_attribute_id_44_auprc": "0.662103"}

Answer 3 · 2024-07-31T11:47:03.000Z

This criterion currently calculates MICRO AUROC by default, not MACRO.
Since the current multi_head_binary_cross_entropy criterion is not supporting macro averaging option at the moment, you may need to get AUROCs for each attribute_id and manually average them.

Answer 4 · 2024-07-31T17:20:25.000Z

Ohh Got it, Very Thanks

For the LLM Modelling experiment as well, I am unable to reproduce the results, I believe these are the ones reported in Table 5.

wandb: test_sampled/question_type2_0_em_accuracy 0.64297
wandb: test_sampled/question_type2_1_em_accuracy 0.31034
wandb: test_sampled/question_type2_2_em_accuracy 0.2467
wandb: test_sampled/question_type2_3_em_accuracy 0.5461
wandb: test_sampled/question_type2_5_em_accuracy 0.04217
wandb: test_sampled/question_type2_6_em_accuracy 0.55181
wandb: test_sampled/question_type2_8_em_accuracy 0.00911

changing the thresholds from 0.5 to the Youden index here might help? Since it was mostly incorrectly classified by the classifier model.
Or anything else that could have gone wrong here, I didn't make any changes to the code, other than the checkpoint.

Answer 5 · 2024-08-01T05:27:49.000Z

Can you confirm if you train SE-WRN model without changing any default hyper-parameters (e.g., # of epochs, learning rate, ...)?

Answer 6 · 2024-08-02T09:31:47.000Z

Yes, didn't change any parameters, since the AUCs are good, changing the threshold might help, will try once.

Answer 7 · 2024-08-03T09:30:33.000Z

Sorry, not able to reproduce the scores, and I'm not even coming close, where do you think the problem might be?

Answer 8 · 2024-08-04T16:34:47.000Z

Did you sample 10% from the test set and use them for LLM evaluation?
Could you let me know which openai model you are using now?

Answer 9 · 2024-08-05T04:47:56.000Z

Yes, followed each a and every step in the README, although the openai model I am using is GPT-4o-mini, which is supposed to be better than gpt-3.5

Answer 10 · 2024-08-05T06:29:28.000Z

If I share the weights file for SE-WRN that has been used for the experiments in the original paper, could you run the LLM experiments using it? For now, I have no credits to run the experiments from my end.
If you are okay with it, please let me know your email address.

Answer 11 · 2024-08-05T06:31:05.000Z

Thanks, that would be great.
parthagrawal02@gmail.com

Answer 12 · 2024-08-05T07:39:06.000Z

I've just sent an email to you. Please check it.

Answer 13 · 2024-08-05T07:41:26.000Z

Received Thanks, will update with the result.

Answer 14 · 2024-08-28T12:22:08.000Z

Hi, any updates on this issue?
If not, I will close this issue.