More information about the train/val/test split
andradesalazar opened this issue · 2 comments
Hi all,
in your latest manuscript you mention that the 30 mio. PSMs from MassIVE-KB were randomly split so that the training, validation and test sets are disjoint at peptide level.
I was wondering whether it's possible to provide more information about the split to be able to reproduce your results.
It would be enough to provide a simple table with the columns peptide and split (containing "train", "val", "test").
Thanks a lot in advance.
Best,
Daniela
Hi Daniela,
You can download the MassIVE-KB train, validation and test splits used for Casanovo training from here: https://noble.gs.washington.edu/~melih/mskb_casanovo_splits.zip
Zipped archive contains three MGF files corresponding to each of the splits and a parquet file with metadata.
For future reference, the dataset will be temporarily available at this URL and we'll find a permanent home for it soon.
Thank you :)