YoungXiyuan/DCA

Why is the loss always negative?

BeerTai opened this issue · 1 comments

Hi, Thanks for your work.
I run your code with the default arguments by following your command in the readme.
Reinforcement Learning: python main.py --mode train --order offset --model_path model --method RL.
Why is the loss always negative?

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
epoch 0 total loss -5881.828728429389 -6.171908424375015
1906
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
epoch 1 total loss -5041.10081607045 -5.289717540472664
2859
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
epoch 2 total loss -4401.4143905708315 -4.6184830960869165
3812
[0, 1, 2, 3, 4, 5, 6, 7]
epoch 3 total loss -3761.1239844140346 -3.9466148839601622
4765
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
epoch 4 total loss -3198.0963033663556 -3.3558198356415065
aida-A micro F1: 0.7510698256966912
aida-B micro F1: 0.7348940914158304
msnbc micro F1: 0.8967100229533282
aquaint micro F1: 0.8041958041958042
ace2004 micro F1: 0.841046277665996
clueweb micro F1: 0.6639417894358606
wikipedia micro F1: 0.6103098883218696
att_mat_diag tensor(17.4249, device='cuda:0')
tok_score_mat_diag tensor(17.3656, device='cuda:0')
ment_att_mat_diag tensor(17.3205, device='cuda:0')
ment_score_mat_diag tensor(17.3205, device='cuda:0')
entity2entity_mat_diag tensor(17.3725, device='cuda:0')
entity2entity_score_mat_diag tensor(17.4487, device='cuda:0')
knowledge2entity_mat_diag tensor(17.2840, device='cuda:0')
knowledge2entity_score_mat_diag tensor(17.3483, device='cuda:0')
ment2ment_mat_diag tensor(17.3205, device='cuda:0')
ment2ment_score_mat_diag tensor(17.3205, device='cuda:0')
f - l1.w, b tensor(5.8933, device='cuda:0') tensor(2.7462, device='cuda:0')
f - l2.w, b tensor(0.6788, device='cuda:0') tensor(0.0029, device='cuda:0')
5718
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
epoch 5 total loss -2894.0255648259754 -3.036752953647403
6671
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
epoch 6 total loss -2617.967075814433 -2.7470798277171387
7624
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
epoch 7 total loss -2439.785110020892 -2.560110293830946
8577
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
epoch 8 total loss -2333.8847663235483 -2.4489871629837863
9530
[0, 1, 2, 3, 4]
epoch 9 total loss -1700.5407024365093 -1.784407872441248
aida-A micro F1: 0.7527397975159169
aida-B micro F1: 0.7409141583054627
msnbc micro F1: 0.9074215761285387
aquaint micro F1: 0.8167832167832169
ace2004 micro F1: 0.8450704225352113
clueweb micro F1: 0.673913043478261
wikipedia micro F1: 0.6227350048073367
att_mat_diag tensor(17.6288, device='cuda:0')
tok_score_mat_diag tensor(17.5042, device='cuda:0')
ment_att_mat_diag tensor(17.3205, device='cuda:0')
ment_score_mat_diag tensor(17.3205, device='cuda:0')
entity2entity_mat_diag tensor(17.6331, device='cuda:0')
entity2entity_score_mat_diag tensor(17.6095, device='cuda:0')
knowledge2entity_mat_diag tensor(17.3107, device='cuda:0')
knowledge2entity_score_mat_diag tensor(17.3347, device='cuda:0')
ment2ment_mat_diag tensor(17.3205, device='cuda:0')
ment2ment_score_mat_diag tensor(17.3205, device='cud

Thank you for your interest in our work and sorry for my late reply.

In the reinforcement learning setting, the loss you mentioned above is actually the policy loss (please check the code mulrel_ranker.py#L688 and mulrel_ranker.py#L653).

And the theoretical explanation for the computation of that policy loss is written in the Section 4.2#Reward of our paper. The main reason for the negative policy loss is that the Agent is designed to be given negative expected rewards in each episode. And the purpose of this design is that we need to train the Agent to make as many correct decisions as possible (that is, to maximize the expected rewards to zero).

Feel free to let me know if there could be any questions (: