neulab/InterpretEval

Bug: Different models generate same breakdown performance pics.

ERICMIAO0817 opened this issue · 0 comments

Describe the bug

The generated html shows totally the same result on different models(Flair&ELMo) like this:
WechatIMG60
It's absolutely that something went wrong because both Flair and ELMo show the same Break-down Performance.
Same bug happens on Self-diagnosis:
WechatIMG61

Debug and Fix

At first I thought maybe the problem is within the main program,but the txt outcome was correct:

# break-down performance
Flair
eCon	0:0.8938461538461538 1:0.8558758314855874 2:0.9645621181262729 3:0.9733250620347393
tCon	0:0.8864388092613011 1:0.8766129032258065 2:0.9627551020408164 3:0.985190670122177
eFre	0:0.8963153384747216 1:0.9464209172738963 2:0.9558373414954089 3:0.960822722820764
tFre	0:0.9345070422535211 1:0.9395325203252033 2:0.9451523545706371 3:0.9270331083252974
eLen	0:0.9355970253963799 1:0.9315525876460768 2:0.8631578947368422 3:0.8507462686567164
sLen	0:0.9378569029224051 1:0.9269841269841269 2:0.9219178082191781 3:0.9319938176197835
eDen	0:0.9208025343189017 1:0.9346981997882103 2:0.9467226348078787 3:0.9323786793953858
oDen	0:0.9352896914973664 1:0.9170383586083855 2:0.89103690685413 3:0.9480401093892433
tag	0:0.9374064091045223 1:0.8423295454545454 2:0.9177877428998505 3:0.974058060531192

ELMo
eCon	0:0.8814928818776453 1:0.8501118568232663 2:0.9590263691683569 3:0.9699624530663328
tCon	0:0.8745598591549296 1:0.872125857200484 2:0.958141909137315 3:0.9838380085454208
eFre	0:0.8852177644282343 1:0.9394589952769429 2:0.9527145359019265 3:0.9527559055118111
tFre	0:0.9295774647887324 1:0.9336721728081323 2:0.9434903047091413 3:0.9157792836398838
eLen	0:0.9279887482419129 1:0.9234209055338177 2:0.8482328482328482 3:0.8467153284671532
sLen	0:0.931986531986532 1:0.9090265486725665 2:0.9142661179698217 3:0.9323017408123792
eDen	0:0.9160789844851905 1:0.9270538243626062 2:0.9272080232934325 3:0.9330677290836653
oDen	0:0.9269641734758014 1:0.907953529937444 2:0.8804920913884007 3:0.9451553930530164
tag	0:0.9340956966596449 1:0.8133903133903134 2:0.9083308450283668 3:0.9715170278637771

# self-diagnosis 
Flair
eCon	1:0.8558758314855874 3:0.9733250620347393 0.1174492305491519
tCon	1:0.8766129032258065 3:0.985190670122177 0.10857776689637044
eFre	0:0.8963153384747216 3:0.960822722820764 0.06450738434604242
tFre	3:0.9270331083252974 2:0.9451523545706371 0.01811924624533967
eLen	3:0.8507462686567164 0:0.9355970253963799 0.08485075673966347
sLen	2:0.9219178082191781 0:0.9378569029224051 0.015939094703226964
eDen	0:0.9208025343189017 2:0.9467226348078787 0.02592010048897697
oDen	2:0.89103690685413 3:0.9480401093892433 0.05700320253511337
tag	1:0.8423295454545454 3:0.974058060531192 0.13172851507664662

ELMo
eCon	1:0.8501118568232663 3:0.9699624530663328 0.11985059624306649
tCon	1:0.872125857200484 3:0.9838380085454208 0.11171215134493684
eFre	0:0.8852177644282343 3:0.9527559055118111 0.06753814108357681
tFre	3:0.9157792836398838 2:0.9434903047091413 0.02771102106925749
eLen	3:0.8467153284671532 0:0.9279887482419129 0.08127341977475966
sLen	1:0.9090265486725665 3:0.9323017408123792 0.02327519213981266
eDen	0:0.9160789844851905 3:0.9330677290836653 0.01698874459847477
oDen	2:0.8804920913884007 3:0.9451553930530164 0.06466330166461576
tag	1:0.8133903133903134 3:0.9715170278637771 0.15812671447346371

So the math is correct, after some efforts, I found the incorrct problem(genFig.py line467-489):

        elif block.find("break-down performance") != -1:
        metaInfo_m1 = extValue(block, model_name1+":\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+":\n", "\n\n")
        dict_breakdown_m1 = str2dict(metaInfo_m1)
        dict_breakdown_m2 = str2dict(metaInfo_m2)


    elif block.find("self-diagnosis") != -1:
        metaInfo_m1 = extValue(block, model_name1+":\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+":\n", "\n\n")
        dict_self_diag_m1 = str2dict(metaInfo_m1)
        dict_self_diag_m2 = str2dict(metaInfo_m2)

    elif block.find("aided-diagnosis line-chart") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ ":\n", "\n\n")
        dict_aided_diag_hist_m1_2 = str2dict(metaInfo_m1_2)




    elif block.find("aided-diagnosis heatmap") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ ":\n", "\n\n")
        dict_aided_diag_heatmap_m1_2 = str2dict(metaInfo_m1_2)

The extValue() takes in a parameter like this:model_name1+":\n", however , if you look at the block(first parameter this method takes), you will find out that model name doesn't end with ':\n', it just end with '\n'.
So after you change the codeblock into:(delete all colons)

    elif block.find("break-down performance") != -1:
        metaInfo_m1 = extValue(block, model_name1+"\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+"\n", "\n\n")
        dict_breakdown_m1 = str2dict(metaInfo_m1)
        dict_breakdown_m2 = str2dict(metaInfo_m2)


    elif block.find("self-diagnosis") != -1:
        metaInfo_m1 = extValue(block, model_name1+"\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+"\n", "\n\n")
        dict_self_diag_m1 = str2dict(metaInfo_m1)
        dict_self_diag_m2 = str2dict(metaInfo_m2)

    elif block.find("aided-diagnosis line-chart") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ "\n", "\n\n")
        dict_aided_diag_hist_m1_2 = str2dict(metaInfo_m1_2)




    elif block.find("aided-diagnosis heatmap") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ "\n", "\n\n")
        dict_aided_diag_heatmap_m1_2 = str2dict(metaInfo_m1_2)

It will work properly and produce:
WechatIMG62
WechatIMG63

Because Flair and ELMo performed almost the same, the fix isn't clear.But if you use other models you will see it clearly.
I will submit a pull request for this fix, it's not a big deal but I am really happy I can be a part of this gorgeous project!!