EpistasisLab/pmlb

regarding the 6 mfeat datasets

greggj2016 opened this issue · 4 comments

The mfeat_pixel dataset has target values ranging from [0.9], while the other mfeat datasets have target values ranging from [1,10]
These datasets are all part of the same collection (https://archive.ics.uci.edu/ml/datasets/Multiple+Features), so they need to have the same feature values because people might compare feature values between the six datasets. I suggest changing the other 5 sets' values from [1,10] to [0,9] because that is consistent with the primary source's site, as well as your current target annotation in other datasets that I've looked at.

come to think of it, the target values should also be [0-9] because the features are statistics that describe images of the characters 0, 1, 2, ..., 9.

Thank you @greggj2016 for raising this issue. @weixuanfu could we discard this recoding of the target column and use the original one? Similarly for #127 as well.

I think it is OK to re-encode the target based on its source.

Closed by #133.