4AI/TDEER

about data statistics in Table 1

Closed this issue · 2 comments

I have two questions about this table:
1、Why the sum of EPO and SEO and Norm is not equal to the number of 5 kinds of triples for NYT and WebNLG?
2、For NYT11 dataset, why the SEO number is the odd number of 1? Since there is no EPO, I think it at least should be an even number 2?

@AndDoIt Sorry for the delaying reply.

Firstly, thanks for following the project. We processed the dataset following to casrel.

For Q1: It's valid as SEO and EPO may exist in a triple list at the same time. See the following example:

{
    "text": "Ampara Hospital is located in the Eastern Province of Sri Lanka , where the currency is the Ski Lankan rupee . One of the leaders of Sri Lanka is Ranil Wickremesinghe .",
    "triple_list": [
        [
            "Hospital",
            "country",
            "Lanka"
        ],
        [
            "Lanka",
            "leaderName",
            "Wickremesinghe"
        ],
        [
            "Lanka",
            "currency",
            "rupee"
        ],
        [
            "Hospital",
            "state",
            "Lanka"
        ]
    ]
}

EPO: (Hospital, country, Lanka), (Hospital, state, Lanka)
SEO: (Hospital, state, Lanka), (Lanka, currency, rupee), ....

For Q2: You're right there is no EPO in the NYT11-HRL dataset, and the only one SEO triple is:

{
    "text": " The United States previously offered to locate the missile system in the Czech Republic and Poland , drawing furious objections from Russia , though Washington argues that the system is not built to defend against Russia but against Iran , principally , and other potential threats . ",
    "triple_list": [
        [
            "Washington",
            "/location/administrative_division/country",
            "Iran"
        ],
        [
            "Russia",
            "/location/administrative_division/country",
            "Iran"
        ]
    ]
}

You can check the code to see how to classify Normal, SEO, and EPO triples.

Thanks very much for your kindly detailed reply! I have mistaken for triple counts following your guided code.