lisa-lab/pylearn2

[bug] show_examples.py assumes yaml file contains a dataset

Opened this issue · 1 comments

TNick commented

This section from show_examples.py:

    if path.endswith('.pkl'):
        from pylearn2.utils import serial
        obj = serial.load(path)
    elif path.endswith('.yaml'):
        print('Building dataset from yaml...')
        obj = yaml_parse.load_path(path)
        print('...done')
    else:
        obj = yaml_parse.load(path)

    if hasattr(obj, 'get_batch_topo'):
        # obj is a Dataset
        dataset = obj

        examples = dataset.get_batch_topo(rows*cols)
    else:
        # obj is a Model
        model = obj

assumes that an object loaded from a .yaml file is a dataset.
Not sure how often is that the case, but we could at least cover the case when top level object is a Train or list of train objects. Something like this:

from pylearn2.train import Train
from pylearn2.models.model import Model
from pylearn2.datasets import Dataset

# ...

    # Do we have a readly available dataset or do we need to load it?
    is_dataset = False

    # only deal with the first item in the list
    if isinstance(obj, list):
        obj = obj[0]

    # some common cases
    if isinstance(obj, Train):
        obj = obj.dataset
        is_dataset = True
    elif isinstance(obj, Model):
        is_dataset = False
    elif isinstance(obj, Dataset) or hasattr(obj, 'get_batch_topo'):
        is_dataset = True

    if is_dataset:
        # obj is a Dataset
        dataset = obj
        examples = dataset.get_batch_topo(rows*cols)
    else:
        # obj is a Model
        model = obj

Thoughts? PR?

Also, the script assumes ['b', 0, 1, 'c'] order from adjust_for_viewer(). Is that stated somewhere in the documentation and I missed it? If not, is it ok to add to adjust_for_viewer docs that:

Returned array should be in ['b', 0, 1, 'c'] order 
(examples, image row, image column, channel).

or something along those lines.

Maybe even better, we can replace the check with

    if examples.shape[3] == 1:
        is_color = False
    elif examples.shape[3] == 3:
        is_color = True
    elif examples.shape[0] == 1:
        is_color = False
        examples = examples.swapaxes(0, 3)
    elif examples.shape[0] == 3:
        is_color = True
        examples = examples.swapaxes(0, 3)
    else:            
        print('got unknown image format with', str(examples.shape[3]), end='')
        print('channels')
        print('supported formats are 1 channel greyscale or three channel RGB')
        quit(-1)

or both. 🎱

It's not so clear to me why a Train object would mean we should extract its training dataset, and not, for instance, the Model in it. I would probably make that an error instead.
adjust_for_viewer does not know anything about the axes, it only works pixel by pixel.
You are right to point out that dataset.get_batch_topo or dataset.get_topological_view could return images in a different format if the axes of the dataset's view_converter are not b01c.
For the case where the dataset is provided, it can be fixed by explicitly building an iterator with a Space that has the appropriate axes. For the case where a model is provided, it would be trickier, I'm not sure what the best way would be yet, maybe query the internal Space of the dataset, or the input Space of the model to get the image dimensions and number of channels, and then build an iterator from design_examples_var with a Conv2DSpace with the appropriate axes.