ExplainableML/czsl

Role of closed_mask in evaluator vs dataset pairs

Closed this issue · 5 comments

I find that the CGE and CompCos predict in all pairs (train pairs + val pairs + test pairs) in test time, but some pairs are masked for evaluation (cf. closed_mask in line 247-252, line 300-306, models/common.py). Is there any special meaning in it?

Hi @care77,

closed_mask is needed to perform closed-world evaluation (i.e. excluding pairs not present in the test set).
Specifically, in the mask, a value for a pair is 1 if the pair should be considered and 0 if it should not be:

  • In the open-world case, we set all the mask values to one, considering all possible pairs (line 248);
  • in the closed-world case, we mask out compositions not present in the dataset (line 250).

I hope this answers your question but, in case it does not, please let me know.

p.s. For faster OW evaluation, consider using KG-SP and Co-CGE repos.

@mancinimassimiliano Thank you for your answers.
I'm not sure the definition of "compositions not present in the dataset".
For example, on MIT-States, there are 1262 seen pairs in train set, 300 seen pairs and 300 unseen pairs in val set, 400 seen and 400 unseen pairs in test set. For evaluation on val set, CGE returns the scores over all pairs (1262 + 300 + 400), but 400 test set unseen pairs will be masked out in evaluator by closed_mask. In this case, are the maksed out pairs the compositions not present in the dataset, If so, why not just retrurn scores of senn pairs (1262 ) and val set unseen pairs (300).
Looking forward to your answer.

Thanks @care77 for the clarification!

Ok, I see your point and the answer is that....there is no particular reason. :)

The main one is that the codebase develops on top of the AoP one and there the closed mask is also used to filter pairs (see here).

In my opinion, filtering in the evaluator allows the model to not care about the evaluation procedure. This means that we do not need to define one inference function for each phase (e.g. one for validation and one for the test), and this is the main advantage code-wise. Other than that, I have no strong arguments for advocating in favor of one choice.

Hope this helps!

I get it. Thank you @mancinimassimiliano for your helpful answers.

You are welcome!