ljynlp/W2NER

Distance Embedding

kizunasunhy opened this issue · 4 comments

Could you please kindly explain why the distance embedding should be like this?

array([[19, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13],
       [ 1, 19, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13],
       [ 2,  1, 19, 10, 11, 11, 12, 12, 12, 12, 13, 13],
       [ 2,  2,  1, 19, 10, 11, 11, 12, 12, 12, 12, 13],
       [ 3,  2,  2,  1, 19, 10, 11, 11, 12, 12, 12, 12],
       [ 3,  3,  2,  2,  1, 19, 10, 11, 11, 12, 12, 12],
       [ 3,  3,  3,  2,  2,  1, 19, 10, 11, 11, 12, 12],
       [ 3,  3,  3,  3,  2,  2,  1, 19, 10, 11, 11, 12],
       [ 4,  3,  3,  3,  3,  2,  2,  1, 19, 10, 11, 11],
       [ 4,  4,  3,  3,  3,  3,  2,  2,  1, 19, 10, 11],
       [ 4,  4,  4,  3,  3,  3,  3,  2,  2,  1, 19, 10],
       [ 4,  4,  4,  4,  3,  3,  3,  3,  2,  2,  1, 19]])

Thank you.

This is a distance index matrix, instead of the embedding. Each index is used to obtain the corresponding distance embedding.

Oh yes sorry for the mistake. But why it's organized as the power of 2 and why the number in the middle is 19?

The distance index organized as the power of 2 is to avoid the data sparse problem. The token pair with long distance usually has a low frequency. The number 0 is used for padding, so I use 19 to replace it.

The distance index organized as the power of 2 is to avoid the data sparse problem. The token pair with long distance usually has a low frequency. The number 0 is used for padding, so I use 19 to replace it.

Thank you!