Cross validation dataset iterators are written specifically for DenseDesignMatrix
se4u opened this issue · 0 comments
se4u commented
- The code in
pylearn2/cross_validation/dataset_iterators.py
callsn = dataset.X.shape[0]
in many places. These statements should be changed ton = dataset.get_num_examples()
because that's the standard way according to the base interface. - The code of
DatasetCV
class inpylearn2/cross_validation/dataset_iterators.py
is specialized to theDenseDesignMatrix
constructors and it does not work withVectorSpacesDataset
. In the absence of a standard constructor in theDataset
abstract interface, I added some exception handling and an assertion to check that the type of datasets used during cross validation is the same as the original dataset. This is not a great solution but at least the assertion is useful.
- X, y = data
- datasets[label] = DenseDesignMatrix(X=X, y=y)
+ try:
+ X, y = data
+ data_subset = DenseDesignMatrix(
+ X=X, y=y, X_labels=self.dataset.X_labels,
+ y_labels=self.dataset.y_labels)
+ except:
+ data_subset = self.dataset.__class__(
+ data=data, data_specs=self.dataset.data_specs)
+ assert isinstance(data_subset, self.dataset.__class__)
+ datasets[label] = data_subset
- Another thing (not an issue) is that the following check in the constructor for
FiniteDatasetIterator
inpylearn2/utils/iteration.py
throws up unless the yaml file is formatted with!!python/tuple
directives for thedata_specs
. A helpful message could be added at this point suggesting this fix.
# Code that throws up when source is list instead of tuple.
898 if not isinstance(source, tuple):
899 source = (source,)
...
904 assert len(convert) == len(source), "Try and change dataset data_specs" + \
" in yaml file to !!python/tuple [ 'a', 'b']"
# Fixed yaml file.
data_specs: !!python/tuple [ 'a', 'b'],