Code for generating candidate entities during evaluation for WikiKG

Hi, I noticed the evaluation data is loaded in this line

smore/smore/training/main_train.py

Line 212 in 5b1a8a0

def load_1p_eval_data(args, phase):

for WikiKG.

However, I cannot find the code for generating candidate tail entities during evaluation. Can you show more details?

Thanks a lot!

Hi, I have updated the preprocess code to download the candidate set. Please check here.

Could you give more details about how to generate the candidates of valid data?

valid_url = "https://snap.stanford.edu/smore/valid.pt"
I see the comments in code: "# Specifically designed for OGB-LSC WikiKG v2. Since no candidates are provided by the original dataset, we generate candidates based on heuristics such as degrees / entity types."
But I can't reproduce this, and the candidates of per relation is difference(about < 1%).

In my opinion, the logistic is train_data.groupby('relation')['tail'].apply(lambda grp: list(grp.value_counts().nlargest(20000).index))
right?