covartech/PRT

proposal: remove prtDataSetCellArray

patrickkwang opened this issue · 3 comments

I believe that prtDataSetClassReshape solves the same problem better.

Three things use prtDataSetCellArray: prtDataGenCifar1, prtDataSetTimeSeries, and prtDataGenMsrcorid. The first two appear to be replaceable with prtDataSetClassReshape with no issues. For prtDataGenMsrcorid, the observations are actually different shapes... It's an older dataset anyway and probably superceded by something like ImageNet, but I don't know that there's a good reason to allow prtDataSets with observations of arbitrary sizes.

Are there other use cases for this of which I'm not aware?

I think you're right that most of the CIFAR/image ones can be done with prtDataSetClassReshape as long as the image chips are the same size. BUT prtDataSetCellArray is specifically for the case were observations are different sizes. For example, prtDataSetTimeSeries can handle time-series of arbitrary length, and that's important, I think?

I propose switching anything that CAN use prtDataSetClassReshape to use it, but leaving prtDataSetCellArray as it's specific use case is for images of varying sizes, graphs, time-series of unknown lengths, etc.

@peterTorrione pointed out that the PRT machinery is useful for data sets with observations of different sizes, provided that you have some sort of shape/size-invariant feature extractor (as a prtAction). I can't find any such feature extractor currently included with PRT, but nevertheless I concede the point.

When observations have a consistent shape, as with prtDataGenCifar1, prtDataSetClassReshape should be used instead of prtDataSetCellArray. It stores data more efficiently and provides some additional useful methods.

Just as an example - I think you can actually do this with

dsTimeSeries = ... % time series data as a prtDataSetCell
class = prtClassMap('rvs',prtRvHmm);
yOut = class.kfolds(dsTImeSeries);

But we're in agreement.