Unable to recreate the results for UpperBound Experiments
Closed this issue · 14 comments
Hey, thanks for the code and the QA dataset.
I am trying to reproduce the results in the paper for upperbound experiments, using the PTB-XL dataset, following every step as mentioned in the README, but I am getting poorer results as compared to the reported results. Is there anything that I am missing?
For SE-WRN model - I am getting Macro Averaged AUC of 0.824.
Thanks
Hi,
Could you let me know how you calculated the macro-AUROC?
Did you get AUROCs for each of 83 attributes and macro-average them?
It logs macro AUROC doesn't it?
I checked the code, it averages across all of the 83 attributes.
criterion:
_name: multi_head_binary_cross_entropy
report_auc: true
log_per_class: true
per_log_keys: [attribute_id]
Example:
"test_auroc": "0.820812",
[2024-07-21 20:48:16,450][test][INFO] - {"epoch": 22, "test_loss": "1.091", "test_nsignals": "125.077", "test_accuracy": "0.77429", "test_cls_73_accuracy": "0.83333", "test_cls_11_accuracy": "0.66538", "test_cls_61_accuracy": "0.83153", "test_cls_52_accuracy": "0.75223", "test_cls_53_accuracy": "0.78659", "test_cls_30_accuracy": "0.88333", "test_cls_81_accuracy": "0.9", "test_cls_24_accuracy": "0.98333", "test_cls_37_accuracy": "0.88084", "test_cls_63_accuracy": "0.9", "test_cls_29_accuracy": "0.83333", "test_cls_20_accuracy": "0.86667", "test_cls_67_accuracy": "0.96667", "test_cls_51_accuracy": "0.81667", "test_cls_69_accuracy": "0.69744", "test_cls_55_accuracy": "0.74854", "test_cls_3_accuracy": "0.78333", "test_cls_14_accuracy": "0.94444", "test_cls_21_accuracy": "0.96667", "test_cls_16_accuracy": "0.79615", "test_cls_35_accuracy": "0.80392", "test_cls_6_accuracy": "0.98333", "test_cls_46_accuracy": "0.72222", "test_cls_74_accuracy": "0.8", "test_cls_13_accuracy": "0.91667", "test_cls_76_accuracy": "0.78333", "test_cls_25_accuracy": "0.9", "test_cls_23_accuracy": "0.70142", "test_cls_45_accuracy": "0.71698", "test_cls_18_accuracy": "0.88333", "test_cls_26_accuracy": "0.81667", "test_cls_68_accuracy": "0.76667", "test_cls_38_accuracy": "0.70846", "test_cls_49_accuracy": "0.71833", "test_cls_12_accuracy": "0.56667", "test_cls_22_accuracy": "0.78", "test_cls_2_accuracy": "0.81667", "test_cls_19_accuracy": "0.83333", "test_cls_31_accuracy": "0.89474", "test_cls_36_accuracy": "0.88333", "test_cls_57_accuracy": "1", "test_cls_27_accuracy": "0.82456", "test_cls_58_accuracy": "0.88333", "test_cls_10_accuracy": "0.94", "test_cls_79_accuracy": "0.78333", "test_cls_42_accuracy": "0.9", "test_cls_48_accuracy": "0.76429", "test_cls_78_accuracy": "0.76667", "test_cls_4_accuracy": "0.73333", "test_cls_65_accuracy": "0.88333", "test_cls_33_accuracy": "0.82759", "test_cls_70_accuracy": "0.71533", "test_cls_56_accuracy": "0.71667", "test_attribute_id_73_accuracy": "0.83333", "test_attribute_id_11_accuracy": "0.66538", "test_attribute_id_61_accuracy": "0.83153", "test_attribute_id_52_accuracy": "0.75223", "test_attribute_id_53_accuracy": "0.78659", "test_attribute_id_30_accuracy": "0.88333", "test_attribute_id_81_accuracy": "0.9", "test_attribute_id_24_accuracy": "0.98333", "test_attribute_id_37_accuracy": "0.88084", "test_attribute_id_63_accuracy": "0.9", "test_attribute_id_29_accuracy": "0.83333", "test_attribute_id_20_accuracy": "0.86667", "test_attribute_id_67_accuracy": "0.96667", "test_attribute_id_51_accuracy": "0.81667", "test_attribute_id_69_accuracy": "0.69744", "test_attribute_id_55_accuracy": "0.74854", "test_attribute_id_3_accuracy": "0.78333", "test_attribute_id_14_accuracy": "0.94444", "test_attribute_id_21_accuracy": "0.96667", "test_attribute_id_16_accuracy": "0.79615", "test_attribute_id_35_accuracy": "0.80392", "test_attribute_id_6_accuracy": "0.98333", "test_attribute_id_46_accuracy": "0.72222", "test_attribute_id_74_accuracy": "0.8", "test_attribute_id_13_accuracy": "0.91667", "test_attribute_id_76_accuracy": "0.78333", "test_attribute_id_25_accuracy": "0.9", "test_attribute_id_23_accuracy": "0.70142", "test_attribute_id_45_accuracy": "0.71698", "test_attribute_id_18_accuracy": "0.88333", "test_attribute_id_26_accuracy": "0.81667", "test_attribute_id_68_accuracy": "0.76667", "test_attribute_id_38_accuracy": "0.70846", "test_attribute_id_49_accuracy": "0.71833", "test_attribute_id_12_accuracy": "0.56667", "test_attribute_id_22_accuracy": "0.78", "test_attribute_id_2_accuracy": "0.81667", "test_attribute_id_19_accuracy": "0.83333", "test_attribute_id_31_accuracy": "0.89474", "test_attribute_id_36_accuracy": "0.88333", "test_attribute_id_57_accuracy": "1", "test_attribute_id_27_accuracy": "0.82456", "test_attribute_id_58_accuracy": "0.88333", "test_attribute_id_10_accuracy": "0.94", "test_attribute_id_79_accuracy": "0.78333", "test_attribute_id_42_accuracy": "0.9", "test_attribute_id_48_accuracy": "0.76429", "test_attribute_id_78_accuracy": "0.76667", "test_attribute_id_4_accuracy": "0.73333", "test_attribute_id_65_accuracy": "0.88333", "test_attribute_id_33_accuracy": "0.82759", "test_attribute_id_70_accuracy": "0.71533", "test_attribute_id_56_accuracy": "0.71667", "test_cls_8_accuracy": "0.63333", "test_cls_7_accuracy": "0.83333", "test_cls_17_accuracy": "0.81667", "test_cls_64_accuracy": "0.7", "test_cls_54_accuracy": "0.83333", "test_cls_43_accuracy": "0.78333", "test_cls_80_accuracy": "0.78333", "test_cls_1_accuracy": "0.8", "test_cls_34_accuracy": "0.81667", "test_cls_15_accuracy": "0.88333", "test_cls_47_accuracy": "0.66667", "test_cls_40_accuracy": "0.7", "test_cls_66_accuracy": "0.73333", "test_cls_32_accuracy": "0.76667", "test_cls_59_accuracy": "0.81633", "test_attribute_id_8_accuracy": "0.63333", "test_attribute_id_7_accuracy": "0.83333", "test_attribute_id_17_accuracy": "0.81667", "test_attribute_id_64_accuracy": "0.7", "test_attribute_id_54_accuracy": "0.83333", "test_attribute_id_43_accuracy": "0.78333", "test_attribute_id_80_accuracy": "0.78333", "test_attribute_id_1_accuracy": "0.8", "test_attribute_id_34_accuracy": "0.81667", "test_attribute_id_15_accuracy": "0.88333", "test_attribute_id_47_accuracy": "0.66667", "test_attribute_id_40_accuracy": "0.7", "test_attribute_id_66_accuracy": "0.73333", "test_attribute_id_32_accuracy": "0.76667", "test_attribute_id_59_accuracy": "0.81633", "test_cls_62_accuracy": "0.82143", "test_cls_39_accuracy": "0.91667", "test_cls_77_accuracy": "0.71667", "test_cls_5_accuracy": "0.85", "test_cls_82_accuracy": "0.91667", "test_cls_28_accuracy": "0.77966", "test_cls_50_accuracy": "0.73333", "test_cls_0_accuracy": "0.8", "test_cls_60_accuracy": "0.83333", "test_cls_9_accuracy": "0.75", "test_attribute_id_62_accuracy": "0.82143", "test_attribute_id_39_accuracy": "0.91667", "test_attribute_id_77_accuracy": "0.71667", "test_attribute_id_5_accuracy": "0.85", "test_attribute_id_82_accuracy": "0.91667", "test_attribute_id_28_accuracy": "0.77966", "test_attribute_id_50_accuracy": "0.73333", "test_attribute_id_0_accuracy": "0.8", "test_attribute_id_60_accuracy": "0.83333", "test_attribute_id_9_accuracy": "0.75", "test_cls_75_accuracy": "1", "test_cls_71_accuracy": "0.8", "test_attribute_id_75_accuracy": "1", "test_attribute_id_71_accuracy": "0.8", "test_cls_72_accuracy": "0.88889", "test_attribute_id_72_accuracy": "0.88889", "test_cls_41_accuracy": "0.86667", "test_cls_44_accuracy": "0.78431", "test_attribute_id_41_accuracy": "0.86667", "test_attribute_id_44_accuracy": "0.78431", "test_num_updates": "21428", "test_best_accuracy": "0.78025", ### "test_auroc": "0.820812", "test_auprc": "0.683473", "test_cls_2_auroc": "0.8825", "test_cls_2_auprc": "0.778447", "test_cls_3_auroc": "0.93", "test_cls_3_auprc": "0.854287", "test_cls_4_auroc": "0.87", "test_cls_4_auprc": "0.781069", "test_cls_6_auroc": "1", "test_cls_6_auprc": "1", "test_cls_10_auroc": "0.9475", "test_cls_10_auprc": "0.825548", "test_cls_11_auroc": "0.656934", "test_cls_11_auprc": "0.497199", "test_cls_12_auroc": "0.61375", "test_cls_12_auprc": "0.390274", "test_cls_13_auroc": "0.9525", "test_cls_13_auprc": "0.938802", "test_cls_14_auroc": "0.966667", "test_cls_14_auprc": "0.881944", "test_cls_16_auroc": "0.865584", "test_cls_16_auprc": "0.761077", "test_cls_18_auroc": "0.98625", "test_cls_18_auprc": "0.97462", "test_cls_19_auroc": "0.92375", "test_cls_19_auprc": "0.860601", "test_cls_20_auroc": "0.91375", "test_cls_20_auprc": "0.860014", "test_cls_21_auroc": "0.99125", "test_cls_21_auprc": "0.98031", "test_cls_22_auroc": "0.8725", "test_cls_22_auprc": "0.663597", "test_cls_23_auroc": "0.674035", "test_cls_23_auprc": "0.529741", "test_cls_24_auroc": "1", "test_cls_24_auprc": "1", "test_cls_25_auroc": "1", "test_cls_25_auprc": "1", "test_cls_26_auroc": "0.8825", "test_cls_26_auprc": "0.832486", "test_cls_27_auroc": "0.975", "test_cls_27_auprc": "0.950102", "test_cls_29_auroc": "0.71405", "test_cls_29_auprc": "0.305185", "test_cls_30_auroc": "0.9175", "test_cls_30_auprc": "0.89171", "test_cls_31_auroc": "0.977941", "test_cls_31_auprc": "0.954191", "test_cls_33_auroc": "0.77913", "test_cls_33_auprc": "0.496385", "test_cls_35_auroc": "0.979545", "test_cls_35_auprc": "0.937247", "test_cls_36_auroc": "0.98", "test_cls_36_auprc": "0.966176", "test_cls_37_auroc": "0.943097", "test_cls_37_auprc": "0.869614", "test_cls_38_auroc": "0.836832", "test_cls_38_auprc": "0.632153", "test_cls_42_auroc": "0.9975", "test_cls_42_auprc": "0.995455", "test_cls_45_auroc": "0.810007", "test_cls_45_auprc": "0.614562", "test_cls_46_auroc": "0.764132", "test_cls_46_auprc": "0.634483", "test_cls_48_auroc": "0.855199", "test_cls_48_auprc": "0.729519", "test_cls_49_auroc": "0.808573", "test_cls_49_auprc": "0.634935", "test_cls_51_auroc": "0.93", "test_cls_51_auprc": "0.885417", "test_cls_52_auroc": "0.645195", "test_cls_52_auprc": "0.314701", "test_cls_53_auroc": "0.858363", "test_cls_53_auprc": "0.723025", "test_cls_55_auroc": "0.811172", "test_cls_55_auprc": "0.628535", "test_cls_56_auroc": "0.86", "test_cls_56_auprc": "0.789953", "test_cls_57_auroc": "1", "test_cls_57_auprc": "1", "test_cls_58_auroc": "0.9425", "test_cls_58_auprc": "0.910785", "test_cls_61_auroc": "0.900636", "test_cls_61_auprc": "0.768744", "test_cls_63_auroc": "0.93625", "test_cls_63_auprc": "0.922303", "test_cls_65_auroc": "0.9375", "test_cls_65_auprc": "0.832466", "test_cls_67_auroc": "0.98375", "test_cls_67_auprc": "0.971739", "test_cls_68_auroc": "0.84", "test_cls_68_auprc": "0.634047", "test_cls_69_auroc": "0.708284", "test_cls_69_auprc": "0.536556", "test_cls_70_auroc": "0.911779", "test_cls_70_auprc": "0.773581", "test_cls_73_auroc": "0.60896", "test_cls_73_auprc": "0.214599", "test_cls_74_auroc": "0.785", "test_cls_74_auprc": "0.763056", "test_cls_76_auroc": "0.862222", "test_cls_76_auprc": "0.782989", "test_cls_78_auroc": "0.845", "test_cls_78_auprc": "0.745796", "test_cls_79_auroc": "0.825", "test_cls_79_auprc": "0.721839", "test_cls_81_auroc": "0.905", "test_cls_81_auprc": "0.917668", "test_attribute_id_2_auroc": "0.8825", "test_attribute_id_2_auprc": "0.778447", "test_attribute_id_3_auroc": "0.93", "test_attribute_id_3_auprc": "0.854287", "test_attribute_id_4_auroc": "0.87", "test_attribute_id_4_auprc": "0.781069", "test_attribute_id_6_auroc": "1", "test_attribute_id_6_auprc": "1", "test_attribute_id_10_auroc": "0.9475", "test_attribute_id_10_auprc": "0.825548", "test_attribute_id_11_auroc": "0.656934", "test_attribute_id_11_auprc": "0.497199", "test_attribute_id_12_auroc": "0.61375", "test_attribute_id_12_auprc": "0.390274", "test_attribute_id_13_auroc": "0.9525", "test_attribute_id_13_auprc": "0.938802", "test_attribute_id_14_auroc": "0.966667", "test_attribute_id_14_auprc": "0.881944", "test_attribute_id_16_auroc": "0.865584", "test_attribute_id_16_auprc": "0.761077", "test_attribute_id_18_auroc": "0.98625", "test_attribute_id_18_auprc": "0.97462", "test_attribute_id_19_auroc": "0.92375", "test_attribute_id_19_auprc": "0.860601", "test_attribute_id_20_auroc": "0.91375", "test_attribute_id_20_auprc": "0.860014", "test_attribute_id_21_auroc": "0.99125", "test_attribute_id_21_auprc": "0.98031", "test_attribute_id_22_auroc": "0.8725", "test_attribute_id_22_auprc": "0.663597", "test_attribute_id_23_auroc": "0.674035", "test_attribute_id_23_auprc": "0.529741", "test_attribute_id_24_auroc": "1", "test_attribute_id_24_auprc": "1", "test_attribute_id_25_auroc": "1", "test_attribute_id_25_auprc": "1", "test_attribute_id_26_auroc": "0.8825", "test_attribute_id_26_auprc": "0.832486", "test_attribute_id_27_auroc": "0.975", "test_attribute_id_27_auprc": "0.950102", "test_attribute_id_29_auroc": "0.71405", "test_attribute_id_29_auprc": "0.305185", "test_attribute_id_30_auroc": "0.9175", "test_attribute_id_30_auprc": "0.89171", "test_attribute_id_31_auroc": "0.977941", "test_attribute_id_31_auprc": "0.954191", "test_attribute_id_33_auroc": "0.77913", "test_attribute_id_33_auprc": "0.496385", "test_attribute_id_35_auroc": "0.979545", "test_attribute_id_35_auprc": "0.937247", "test_attribute_id_36_auroc": "0.98", "test_attribute_id_36_auprc": "0.966176", "test_attribute_id_37_auroc": "0.943097", "test_attribute_id_37_auprc": "0.869614", "test_attribute_id_38_auroc": "0.836832", "test_attribute_id_38_auprc": "0.632153", "test_attribute_id_42_auroc": "0.9975", "test_attribute_id_42_auprc": "0.995455", "test_attribute_id_45_auroc": "0.810007", "test_attribute_id_45_auprc": "0.614562", "test_attribute_id_46_auroc": "0.764132", "test_attribute_id_46_auprc": "0.634483", "test_attribute_id_48_auroc": "0.855199", "test_attribute_id_48_auprc": "0.729519", "test_attribute_id_49_auroc": "0.808573", "test_attribute_id_49_auprc": "0.634935", "test_attribute_id_51_auroc": "0.93", "test_attribute_id_51_auprc": "0.885417", "test_attribute_id_52_auroc": "0.645195", "test_attribute_id_52_auprc": "0.314701", "test_attribute_id_53_auroc": "0.858363", "test_attribute_id_53_auprc": "0.723025", "test_attribute_id_55_auroc": "0.811172", "test_attribute_id_55_auprc": "0.628535", "test_attribute_id_56_auroc": "0.86", "test_attribute_id_56_auprc": "0.789953", "test_attribute_id_57_auroc": "1", "test_attribute_id_57_auprc": "1", "test_attribute_id_58_auroc": "0.9425", "test_attribute_id_58_auprc": "0.910785", "test_attribute_id_61_auroc": "0.900636", "test_attribute_id_61_auprc": "0.768744", "test_attribute_id_63_auroc": "0.93625", "test_attribute_id_63_auprc": "0.922303", "test_attribute_id_65_auroc": "0.9375", "test_attribute_id_65_auprc": "0.832466", "test_attribute_id_67_auroc": "0.98375", "test_attribute_id_67_auprc": "0.971739", "test_attribute_id_68_auroc": "0.84", "test_attribute_id_68_auprc": "0.634047", "test_attribute_id_69_auroc": "0.708284", "test_attribute_id_69_auprc": "0.536556", "test_attribute_id_70_auroc": "0.911779", "test_attribute_id_70_auprc": "0.773581", "test_attribute_id_73_auroc": "0.60896", "test_attribute_id_73_auprc": "0.214599", "test_attribute_id_74_auroc": "0.785", "test_attribute_id_74_auprc": "0.763056", "test_attribute_id_76_auroc": "0.862222", "test_attribute_id_76_auprc": "0.782989", "test_attribute_id_78_auroc": "0.845", "test_attribute_id_78_auprc": "0.745796", "test_attribute_id_79_auroc": "0.825", "test_attribute_id_79_auprc": "0.721839", "test_attribute_id_81_auroc": "0.905", "test_attribute_id_81_auprc": "0.917668", "test_cls_1_auroc": "0.81875", "test_cls_1_auprc": "0.763658", "test_cls_7_auroc": "0.88125", "test_cls_7_auprc": "0.752982", "test_cls_8_auroc": "0.6775", "test_cls_8_auprc": "0.426755", "test_cls_15_auroc": "0.9775", "test_cls_15_auprc": "0.949399", "test_cls_17_auroc": "0.855", "test_cls_17_auprc": "0.833072", "test_cls_32_auroc": "0.87625", "test_cls_32_auprc": "0.804281", "test_cls_34_auroc": "0.89", "test_cls_34_auprc": "0.854619", "test_cls_40_auroc": "0.75375", "test_cls_40_auprc": "0.657261", "test_cls_43_auroc": "0.955", "test_cls_43_auprc": "0.92042", "test_cls_47_auroc": "0.72375", "test_cls_47_auprc": "0.570795", "test_cls_54_auroc": "0.7875", "test_cls_54_auprc": "0.65625", "test_cls_59_auroc": "0.625", "test_cls_59_auprc": "0.36227", "test_cls_64_auroc": "0.65125", "test_cls_64_auprc": "0.577074", "test_cls_66_auroc": "0.7775", "test_cls_66_auprc": "0.632891", "test_cls_80_auroc": "0.745", "test_cls_80_auprc": "0.639197", "test_attribute_id_1_auroc": "0.81875", "test_attribute_id_1_auprc": "0.763658", "test_attribute_id_7_auroc": "0.88125", "test_attribute_id_7_auprc": "0.752982", "test_attribute_id_8_auroc": "0.6775", "test_attribute_id_8_auprc": "0.426755", "test_attribute_id_15_auroc": "0.9775", "test_attribute_id_15_auprc": "0.949399", "test_attribute_id_17_auroc": "0.855", "test_attribute_id_17_auprc": "0.833072", "test_attribute_id_32_auroc": "0.87625", "test_attribute_id_32_auprc": "0.804281", "test_attribute_id_34_auroc": "0.89", "test_attribute_id_34_auprc": "0.854619", "test_attribute_id_40_auroc": "0.75375", "test_attribute_id_40_auprc": "0.657261", "test_attribute_id_43_auroc": "0.955", "test_attribute_id_43_auprc": "0.92042", "test_attribute_id_47_auroc": "0.72375", "test_attribute_id_47_auprc": "0.570795", "test_attribute_id_54_auroc": "0.7875", "test_attribute_id_54_auprc": "0.65625", "test_attribute_id_59_auroc": "0.625", "test_attribute_id_59_auprc": "0.36227", "test_attribute_id_64_auroc": "0.65125", "test_attribute_id_64_auprc": "0.577074", "test_attribute_id_66_auroc": "0.7775", "test_attribute_id_66_auprc": "0.632891", "test_attribute_id_80_auroc": "0.745", "test_attribute_id_80_auprc": "0.639197", "test_cls_0_auroc": "0.7525", "test_cls_0_auprc": "0.698853", "test_cls_5_auroc": "0.9", "test_cls_5_auprc": "0.858755", "test_cls_9_auroc": "0.8225", "test_cls_9_auprc": "0.75473", "test_cls_28_auroc": "0.855263", "test_cls_28_auprc": "0.753322", "test_cls_39_auroc": "0.9725", "test_cls_39_auprc": "0.932268", "test_cls_50_auroc": "0.71125", "test_cls_50_auprc": "0.582936", "test_cls_60_auroc": "0.92625", "test_cls_60_auprc": "0.875121", "test_cls_62_auroc": "0.973437", "test_cls_62_auprc": "0.935572", "test_cls_77_auroc": "0.69625", "test_cls_77_auprc": "0.593529", "test_cls_82_auroc": "0.94625", "test_cls_82_auprc": "0.834483", "test_attribute_id_0_auroc": "0.7525", "test_attribute_id_0_auprc": "0.698853", "test_attribute_id_5_auroc": "0.9", "test_attribute_id_5_auprc": "0.858755", "test_attribute_id_9_auroc": "0.8225", "test_attribute_id_9_auprc": "0.75473", "test_attribute_id_28_auroc": "0.855263", "test_attribute_id_28_auprc": "0.753322", "test_attribute_id_39_auroc": "0.9725", "test_attribute_id_39_auprc": "0.932268", "test_attribute_id_50_auroc": "0.71125", "test_attribute_id_50_auprc": "0.582936", "test_attribute_id_60_auroc": "0.92625", "test_attribute_id_60_auprc": "0.875121", "test_attribute_id_62_auroc": "0.973437", "test_attribute_id_62_auprc": "0.935572", "test_attribute_id_77_auroc": "0.69625", "test_attribute_id_77_auprc": "0.593529", "test_attribute_id_82_auroc": "0.94625", "test_attribute_id_82_auprc": "0.834483", "test_cls_71_auroc": "0.80375", "test_cls_71_auprc": "0.755221", "test_cls_75_auroc": "1", "test_cls_75_auprc": "1", "test_attribute_id_71_auroc": "0.80375", "test_attribute_id_71_auprc": "0.755221", "test_attribute_id_75_auroc": "1", "test_attribute_id_75_auprc": "1", "test_cls_72_auroc": "0.977778", "test_cls_72_auprc": "0.916667", "test_attribute_id_72_auroc": "0.977778", "test_attribute_id_72_auprc": "0.916667", "test_cls_41_auroc": "0.93375", "test_cls_41_auprc": "0.835404", "test_cls_44_auroc": "0.863636", "test_cls_44_auprc": "0.662103", "test_attribute_id_41_auroc": "0.93375", "test_attribute_id_41_auprc": "0.835404", "test_attribute_id_44_auroc": "0.863636", "test_attribute_id_44_auprc": "0.662103"}
This criterion currently calculates MICRO AUROC by default, not MACRO.
Since the current multi_head_binary_cross_entropy
criterion is not supporting macro averaging option at the moment, you may need to get AUROCs for each attribute_id and manually average them.
Ohh Got it, Very Thanks
For the LLM Modelling experiment as well, I am unable to reproduce the results, I believe these are the ones reported in Table 5.
wandb: test_sampled/question_type2_0_em_accuracy 0.64297
wandb: test_sampled/question_type2_1_em_accuracy 0.31034
wandb: test_sampled/question_type2_2_em_accuracy 0.2467
wandb: test_sampled/question_type2_3_em_accuracy 0.5461
wandb: test_sampled/question_type2_5_em_accuracy 0.04217
wandb: test_sampled/question_type2_6_em_accuracy 0.55181
wandb: test_sampled/question_type2_8_em_accuracy 0.00911
changing the thresholds from 0.5 to the Youden index here might help? Since it was mostly incorrectly classified by the classifier model.
Or anything else that could have gone wrong here, I didn't make any changes to the code, other than the checkpoint.
Can you confirm if you train SE-WRN model without changing any default hyper-parameters (e.g., # of epochs, learning rate, ...)?
Yes, didn't change any parameters, since the AUCs are good, changing the threshold might help, will try once.
Sorry, not able to reproduce the scores, and I'm not even coming close, where do you think the problem might be?
Did you sample 10% from the test set and use them for LLM evaluation?
Could you let me know which openai model you are using now?
Yes, followed each a and every step in the README, although the openai model I am using is GPT-4o-mini, which is supposed to be better than gpt-3.5
If I share the weights file for SE-WRN that has been used for the experiments in the original paper, could you run the LLM experiments using it? For now, I have no credits to run the experiments from my end.
If you are okay with it, please let me know your email address.
Thanks, that would be great.
parthagrawal02@gmail.com
I've just sent an email to you. Please check it.
Received Thanks, will update with the result.
Hi, any updates on this issue?
If not, I will close this issue.