Did you use the test data during training in the Unsupervised Parsing experiment ?
jiaxin96 opened this issue · 1 comments
jiaxin96 commented
On reviewing the fellowing code, I find that the train data contain the test data. Is this coirrect?
Line 25 in 46d63cd
for id in file_ids:
if 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
train_file_ids.append(id)
if 'WSJ/22/WSJ_2200.MRG' <= id <= 'WSJ/22/WSJ_2299.MRG':
valid_file_ids.append(id)
if 'WSJ/23/WSJ_2300.MRG' <= id <= 'WSJ/23/WSJ_2399.MRG':
test_file_ids.append(id)
# elif 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/01/WSJ_0199.MRG' or 'WSJ/24/WSJ_2400.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
# rest_file_ids.append(id)
shawntan commented
The distance values are extracted from the already trained models on the ptb language modeling training set. There's no additional training involved when performing unsupervised parsing in our set up.