Add function that verifies if the input dataset is valid
Opened this issue · 0 comments
EdoardoAbatiTR commented
As described in the README, we have some requirements for the input dataset if the user decides to use the built-in pipelines:
If you are planning to use any of the included pipelines, you must have a dataset split into 3 files (train.csv, dev.csv and test.csv) that contain train, validation and test sets respectively. Each file must have the following columns:
id: an identifier for each sample, e.g. a document id
text: the input text
labels: the labels list as a string (e.g. "[LabelA, OtherLabel, LabelB]")
It would be great to have a function that helps user verify that their datasets fulfill these requirements.