Glorf/recipenlg

train/val/test split

Opened this issue · 0 comments

I downloaded the RecipeNLG dataset and found that there are 588k examples from Recipe1M and 1.643M gathered examples. Recipe1M by itself has three splits: train, val and test with 721k, 155k, and 154k examples respectively.

I want to clarify if you used their val and test split in RecipeNLG dataset. I'm working on a project where I used Recipe1M val and test split and wish to stick to the same set for evaluation. But I want to make sure that there is no leakage if I use RecipeNLG for training.