txsun1997/CoLAKE

"wikidata5m_triplet.txt"在哪里?

GX77 opened this issue · 5 comments

GX77 commented

"This is to remove FewRel test set from our training data. If your need is not just reproducing the experiments,you can discard this part. The ernie_data is obtained from https://github.com/thunlp/ERNIE"
这句注释的意思是:如果想重新训练模型,就只需要屏蔽下面这几行?
fewrel_triples = set()
'''
with open('../ernie_data/fewrel/test.json', 'r', encoding='utf-8') as fin:
fewrel_data = json.load(fin)
for ins in fewrel_data:
r = ins['label']
h, t = ins['ents'][0][0], ins['ents'][1][0]
fewrel_triples.add((h, r, t))
print('# triples in FewRel test set: {}'.format(len(fewrel_triples)))
print(list(fewrel_triples)[0])
'''

如果不是为了发表论文和之前的方法公平的比较,可以注释掉这几行

GX77 commented
with open("../wikidata5m_triplet.txt", 'r', encoding='utf-8') as fin:
    lines = fin.readlines()
    for i in tqdm(range(len(lines))):
        line = lines[i]
        v = line.strip().split("\t")
        if len(v) != 3:
            continue
        h, r, t = v
        if (h, r, t) not in fewrel_triples:
            if h in head_cluster:
                head_cluster[h].append((r, t))
            else:
                head_cluster[h] = [(r, t)]
            if t in tail_cluster:
                tail_cluster[t].append((r, h))
            else:
                tail_cluster[t] = [(r, h)]
        else:
            num_del += 1
        total += 1

您好 那这些也要注释掉吗?我就只注释了“ with open('../ernie_data/fewrel/test.json', 'r', encoding='utf-8') as fin:”

这些不注释

GX77 commented