请教一个关于代码的问题

Question

请教一个关于代码的问题

bwxing opened this issue 6 years ago · 1 comments

在utils.py文件中是不是只把contexts里面的词加入了词典，没有将aspect中的词加入Wordid中？
这是您的代码，注释掉的是我加上的，我加上了aspect里的词但是准确率下降了，希望您能帮忙解答一下
words = []

    lines = open(train_fname, 'r').readlines()
    for i in range(0, len(lines), 3):
        sptoks = nlp(lines[i].strip())
        words.extend([sp.text.lower() for sp in sptoks])
        if len(sptoks) - 1 > max_context_len:
            max_context_len = len(sptoks) - 1
        sptoks = nlp(lines[i + 1].strip())
        ##words.extend([sp.text.lower() for sp in sptoks])
        if len(sptoks) > max_aspect_len:
            max_aspect_len = len(sptoks)
    word_count = Counter(words).most_common()
    for word, _ in word_count:
        if word not in word2id and ' ' not in word and '\n' not in word and 'aspect_term' not in word:
            word2id[word] = len(word2id)

Answer 1 · 2018-08-31T07:20:22.000Z

这里的确有问题，感谢您的指正
准确率下降可能是因为参数设置、数据集样本不多等原因导致模型不稳定而发生的。我一般是采取多次实验取中位数作为实验结果，你也可以选择均值。理论上这一细微的改动并不会对实验结果造成较大影响
关于数据集的预处理我会按原论文思路做一下改动并简化一下代码