Taking care of split:meta:int column present in the input csv

Question

Taking care of split:meta:int column present in the input csv

Closed this issue 10 months ago · 9 comments

see comment :

How do you see this happening ? Currently, we throw an error if there is no split:meta:int column, I think that even if a split:meta:int column exists, if a splitter is called, this wil be overwritten, if a splitter isn't called, it will stay as is.

Answer 1 · 2024-03-13T11:10:41.000Z

the first error is more than fine because it caver cases in which the user did not specify neither manually neither explicitly a splitter scheme. I would like to have another error in case that column already exist and the split mthod is called on such a csv. Going something like "a split meta column is already present but a splitter function has been called. remove the split meta column if you want to call a split function."

Answer 2 · 2024-03-13T11:12:32.000Z

Do you expect users to strip gold standard csvs (so ones that are currated by us and provided with an already present split column) from their split column before doing analysis ?

Answer 3 · 2024-03-13T11:12:52.000Z

I would be more in favor of a warning than an error

Answer 4 · 2024-03-13T11:14:32.000Z

@suzannejin this is useful for implenting the CsvParser class

Answer 5 · 2024-03-13T11:15:00.000Z

OK then if both are present a warning is sent and then a new set is created according to the split function call. This one is a bit tricky though so i'll need help/input on the how to do it.

Answer 6 · 2024-03-13T11:17:18.000Z

I guess this check should be at the level of the csv handler/parser. the experiment class given by the user should not take care of this. Is something up to us to handle.

Answer 7 · 2024-03-13T11:19:20.000Z

Yes this is done in Python of course when the splitting method is called in the CsvParser class

Answer 8 · 2024-03-13T11:23:25.000Z

Is there going to be a csv_add_noise and a csv_split_set functions?. Or is all up to the user in the experiment class?

Answer 9 · 2024-03-13T14:14:11.000Z

what if split is by itself a [optional] category, instead of being inside meta?
Then if split is given, it parses the splits. And if not, it does it by default