PPL indicator

Lines 78 to 93 in 4749848

    
                       cmses = cms_list[random_id].split(';')[1:] 
        
                       res[eval_id] = [cms] 
        
                       gts[eval_id] = cmses 
        
                       eval_id += 1 
        
                       ppl_corpus = '' 
        
                       for c in cmses: 
        
                           total_cms.add(c.lower()) 
        
                           ppl_corpus += ' ' + c.lower() 
        
                       tokens = nltk.word_tokenize(ppl_corpus) 
        
                       unigram_model = unigram(tokens) 
        
                       ppl_scores.append(perplexity(c.lower(), unigram_model)) 
        
           # Compute PPL score 
        
           print('Perplexity score: ', sum(ppl_scores)/len(ppl_scores))

hi! @jacobswan1
This PPL index calculates the index of a certain GT, not the PPL of the predicted sentence.
Why does this code work out as a value on paper?

Hi @Muccul, thanks for pointing this error out. Indeed I just ran a quick test and found it does give the wrong PPL score.

I'll address it and update the arxiv table accordingly later, while I think it does not invalidate the paper’s other claims yet.

please stay tuned and I'm working on getting the correct ppl scores and update it soon.

ps: Many thanks again for letting me know! And also thanks Weijiang for forwarding this issue to me.

Hi @Muccul, I've updated both PPL eval codes, also together with other re-implemented baseline numbers, and pre-trained checkpoint.
Again, I appreciate your help in pointing out the bug, and many thanks for your help. Many thanks!

	cmses = cms_list[random_id].split(';')[1:]
	res[eval_id] = [cms]
	gts[eval_id] = cmses

	eval_id += 1

	ppl_corpus = ''
	for c in cmses:
	total_cms.add(c.lower())
	ppl_corpus += ' ' + c.lower()
	tokens = nltk.word_tokenize(ppl_corpus)
	unigram_model = unigram(tokens)
	ppl_scores.append(perplexity(c.lower(), unigram_model))

	# Compute PPL score
	print('Perplexity score: ', sum(ppl_scores)/len(ppl_scores))