"wikidata5m_triplet.txt"在哪里?
GX77 opened this issue · 5 comments
"This is to remove FewRel test set from our training data. If your need is not just reproducing the experiments,you can discard this part. The ernie_data
is obtained from https://github.com/thunlp/ERNIE"
这句注释的意思是:如果想重新训练模型,就只需要屏蔽下面这几行?
fewrel_triples = set()
'''
with open('../ernie_data/fewrel/test.json', 'r', encoding='utf-8') as fin:
fewrel_data = json.load(fin)
for ins in fewrel_data:
r = ins['label']
h, t = ins['ents'][0][0], ins['ents'][1][0]
fewrel_triples.add((h, r, t))
print('# triples in FewRel test set: {}'.format(len(fewrel_triples)))
print(list(fewrel_triples)[0])
'''
如果不是为了发表论文和之前的方法公平的比较,可以注释掉这几行
with open("../wikidata5m_triplet.txt", 'r', encoding='utf-8') as fin:
lines = fin.readlines()
for i in tqdm(range(len(lines))):
line = lines[i]
v = line.strip().split("\t")
if len(v) != 3:
continue
h, r, t = v
if (h, r, t) not in fewrel_triples:
if h in head_cluster:
head_cluster[h].append((r, t))
else:
head_cluster[h] = [(r, t)]
if t in tail_cluster:
tail_cluster[t].append((r, h))
else:
tail_cluster[t] = [(r, h)]
else:
num_del += 1
total += 1
您好 那这些也要注释掉吗?我就只注释了“ with open('../ernie_data/fewrel/test.json', 'r', encoding='utf-8') as fin:”
这些不注释