YoungXiyuan/DCA

Why are existing candidates dropped in `find_coref`?

peblair opened this issue · 1 comments

Hello,

I am trying to understand the with_coref and find_coref functions in the dataset loader. Roughly speaking, it appears that the goal of find_coref is to do the following (in pseudo-code):

find_coref(cur_m) :=
for each mention m in the same document as cur_m:
  if m's mention text starts or ends with the same text as cur_m BUT not equal to cur_m:
    add all of m's candidates to the result list (removing duplicates)
return the collected candidates

The results of find_coref are then used to overwrite cur_m's candidate list. This is a bit confusing to me, though, since the BUT ... above means that the candidates which were previously inside of cur_m's candidate list are lost (or at least potentially lost). Is this intentional? If so, can you explain what with_coref is intended to accomplish?

For example, on a local modification of this repository, I found that the gold entity (Teresa) is dropped from the list of candidates (I've verified in the AIDA train CSV [line 2426] that this is indeed the correct gold entity for this mention):

RuntimeError: Failed to find gold_key 'Teresa' in list: [(0, ('Mother_Teresa', 1.0)), (1, ('Mother_Teresa_High_School', 0.001)), (2, ('The_Missionary_Position', 0.001)), (3, ('Blessed_Mother_Teresa_Catholic_Secondary_School', 0.0))]
orig list: [['Teresa', 0.364], ['Teresa_(Barbie)', 0.138], ['Teresa,_Rizal', 0.115], ['Teresa_Nielsen_Hayden', 0.103], ['Teresa_of_Ávila', 0.092], ['Teresa_Heinz', 0.038], ['Teresa,_Castellón', 0.031], ['Teresa,_Greater_Poland_Voivodeship', 0.029], ['Mother_Teresa', 0.026], ['Teresa_Scanlan', 0.021], ['Teresa_Teng', 0.018], ['Theresa,_Countess_of_Portugal', 0.018], ['George_McGovern', 0.015], ['Teresa_Crippen', 0.013], ['Teresa_Palmer', 0.012], ['Teresa_Cristina_of_the_Two_Sicilies', 0.01], ['Teresa_Earnhardt', 0.01], ['Teresa_Wynn_Roseborough', 0.009], ['Teresa_(2010_telenovela)', 0.009], ['The_Real_Housewives_of_New_Jersey', 0.008], ['Teresa_(film)', 0.008], ['Teresa_Jungman', 0.008], ['Teresa_Bagioli_Sickles', 0.007], ['Teresa_Fernández_de_Traba', 0.007], ['Teresa_Bryant', 0.007], ['Teresa,_Contessa_Guiccioli', 0.007], ['Teresa_Strasser', 0.006], ['Teresa_Vaill', 0.006], ['Teresa_Mak', 0.006], ['Teresa_Murphy', 0.006], ['Teresa_Cheung_(actress)', 0.006], ['Teresa_Rivera', 0.006], ['Teresa_Nzola_Meso_Ba', 0.006], ['Tracy_Bond', 0.006], ['Teresa_Medina', 0.006], ['Infanta_Maria_Teresa_of_Spain', 0.006], ['Teresa_Seiblitz', 0.006], ['Teresa_Forcier', 0.006], ['Teresa_Taylor', 0.006], ['Teresa_Motos', 0.006], ['Teresa_Piotrowska', 0.006], ['Teresa_Ferster_Glazier', 0.006], ['Teresa_Fedor', 0.006], ['Teresa_Ganzel', 0.006], ['Teresa_Portela_(Portuguese_canoeist)', 0.006], ['Teresa_de_la_Parra', 0.006], ['Teresa_Piccini', 0.006], ['Teresa_Borawska', 0.006], ['Princess_Maria_Teresa_of_Savoy', 0.006], ['Teresa_Roncon', 0.006], ['Teresa_Wentzler', 0.006], ['Teresa_Machado', 0.006], ['Teresa_Magbanua', 0.006], ['Teresa_del_Po', 0.006], ['Teresa_Sapieha', 0.006], ['Teresa_Edwards', 0.006], ['Teresa_A._Dolan', 0.006], ['Teresa_Hurtado_de_Ory', 0.006], ['Teresa_De_Sio', 0.006], ['Teresa_Hsu_Chih', 0.006], ['Lady_Teresa_Waugh', 0.006], ['Teresa_Lourenco', 0.006], ['Teresa_Lubomirska', 0.006], ['Teresio_Maria_Languasco', 0.006], ['Teresa_Woo-Paw', 0.006], ['Teresa_de_Cartagena', 0.006], ['Teresa_Bernabe', 0.006], ['Teresa_Amabile', 0.006], ['Maria_Teresa,_Princess_of_Beira', 0.006], ['Teresa_Korwin_Gosiewska', 0.006], ['Teresa_Bright', 0.006], ['Teresa_Daly', 0.006], ['Teresa_Villaverde', 0.006], ['Teresa_Stich-Randall', 0.006], ['Teresa_Polias', 0.006], ['Teresa_Wong', 0.006], ['Teresa_Pavlinek', 0.006], ['Teresa_Ruiz_(politician)', 0.006], ['Teresa_Cooper', 0.006], ['Teresa_Carr_Deni', 0.006], ['Teresa_P._Pica', 0.006], ['Teresa_S._Polley', 0.006], ['Teresa_Stratas', 0.006], ['Teresa_Lipowska', 0.006], ['Teresa_Carpio', 0.006], ['Teresa_Stolz', 0.006], ['Teresa_Wilson', 0.006], ['Teresa_Lalor', 0.006], ['Teresa_Hannigan', 0.006], ['Teresa_Chodkiewicz', 0.006], ['Teresa_Lisbon', 0.006], ['Teresa_Forn', 0.006], ['Teresa_Gutierrez', 0.006], ['Teresa_Maxwell-Conover', 0.006], ['Teresa_Ann_Savoy', 0.006], ['Teresa_Trull', 0.006], ['Teresa_Forcades', 0.006], ['Teresa_Lynch', 0.006], ['Teresa_Furtado', 0.006], ['Teresa_Southwick', 0.006]]

Any help on understanding this would be very useful. Thanks!

Sorry for my late reply and thank you for your interest in our work.

  1. As to your first question (the meaning of the with_coref and find_coref functions), I suggest you to refer to the paper Deep Joint Entity Disambiguation with Local Neural Attention (Please read the third paragraph in the Section 6 Candidate Selection).

  2. As to your second question about the Teresa example, my explanation is that any coreference method could introduce some loss, though it may introduce more accuracy.