lisa-lab/pylearn2

Cross validation dataset iterators are written specifically for DenseDesignMatrix

Opened this issue · 0 comments

se4u commented
  1. The code in pylearn2/cross_validation/dataset_iterators.py calls n = dataset.X.shape[0] in many places. These statements should be changed to n = dataset.get_num_examples() because that's the standard way according to the base interface.
  2. The code of DatasetCV class in pylearn2/cross_validation/dataset_iterators.py is specialized to the DenseDesignMatrix constructors and it does not work with VectorSpacesDataset. In the absence of a standard constructor in the Dataset abstract interface, I added some exception handling and an assertion to check that the type of datasets used during cross validation is the same as the original dataset. This is not a great solution but at least the assertion is useful.
-                X, y = data
-                datasets[label] = DenseDesignMatrix(X=X, y=y)
+                try:
+                    X, y = data
+                    data_subset = DenseDesignMatrix(
+                        X=X, y=y, X_labels=self.dataset.X_labels,
+                        y_labels=self.dataset.y_labels)
+                except:
+                    data_subset = self.dataset.__class__(
+                        data=data, data_specs=self.dataset.data_specs)
+                assert isinstance(data_subset, self.dataset.__class__)
+                datasets[label] = data_subset
  • Another thing (not an issue) is that the following check in the constructor for FiniteDatasetIterator in pylearn2/utils/iteration.py throws up unless the yaml file is formatted with !!python/tuple directives for the data_specs. A helpful message could be added at this point suggesting this fix.
# Code that throws up when source is list instead of tuple.
898 if not isinstance(source, tuple):
899            source = (source,)
...
904 assert len(convert) == len(source), "Try and change dataset data_specs" + \
             " in yaml file to !!python/tuple [ 'a', 'b']"
# Fixed yaml file.
data_specs: !!python/tuple [ 'a', 'b'],