Taking care of split:meta:int column present in the input csv
Closed this issue · 9 comments
see comment :
How do you see this happening ? Currently, we throw an error if there is no split:meta:int column, I think that even if a split:meta:int column exists, if a splitter is called, this wil be overwritten, if a splitter isn't called, it will stay as is.
the first error is more than fine because it caver cases in which the user did not specify neither manually neither explicitly a splitter scheme. I would like to have another error in case that column already exist and the split mthod is called on such a csv. Going something like "a split meta column is already present but a splitter function has been called. remove the split meta column if you want to call a split function."
Do you expect users to strip gold standard csvs (so ones that are currated by us and provided with an already present split column) from their split column before doing analysis ?
I would be more in favor of a warning than an error
@suzannejin this is useful for implenting the CsvParser class
OK then if both are present a warning is sent and then a new set is created according to the split function call. This one is a bit tricky though so i'll need help/input on the how to do it.
I guess this check should be at the level of the csv handler/parser. the experiment class given by the user should not take care of this. Is something up to us to handle.
Yes this is done in Python of course when the splitting method is called in the CsvParser class
Is there going to be a csv_add_noise and a csv_split_set functions?. Or is all up to the user in the experiment class?
what if split is by itself a [optional] category, instead of being inside meta?
Then if split is given, it parses the splits. And if not, it does it by default