lisa-lab/pylearn2

[bug] can't run gpu_pkl_to_cpu_pkl.py

TNick opened this issue · 3 comments

As you can see in the traceback I had a problem converting between gpu and cpu variants.

I've added after this line

        elif isinstance(obj, (types.FunctionType, types.BuiltinFunctionType)):
            print(prefix + "skipping a function (can't pickle function objects)")
            rval = None

I've also modified these lines:

        if hasattr(obj, 'set_value'):
            # Base case: we found a shared variable, must convert it
            rval = shared(obj.get_value())
            try:
                rval.name = obj.name
            except AttributeError:
                pass
            # Sabotage its getstate so if something tries to pickle it, we'll find out
            obj.__getstate__ = None

The script only seems to work with device=cpu so we could spare a bit of time to future guys by adding

# ...
if __name__ == '__main__':
    # theano.config.device is read-only so we change the value in environment
    # before importing theano
    thflags = os.environ['THEANO_FLAGS']
    if thflags:
        thflags = thflags + ",device=cpu"
    else:
        thflags = "device=cpu"
    os.environ['THEANO_FLAGS'] = thflags

    _, in_path, out_path = sys.argv
    # ...

I was then able to show_weights.py, browse_conv_weights.py, num_parameters.py, pkl_inspector.py, plot_monitor.py, etc. I did not try to fprop() in any way.

I am aware of the discussion in pylearn-users and the comments by @goodfeli in the file header.
Anyone facing issues with gpu_pkl_to_cpu_pkl.py should read this thread.

Regarding the first point, maybe the reason it is not able to pickle that function is because it existed in the version of Pylearn2 that saved the model in the first place, but it does not exist in the current version. In that case, I'm not sure we should always skip them. I don't know what the right way of detecting that would be, except by trying to pickle it in a string and seeing if it works.
Regarding the second point, yes, keeping the name is a good idea. I would probably call hasattr rather than try/catch, but it does not really matter.
Regarding the last point, sure.

I happened to have test_print_monitor_cv.py opened in an editor so I've copied to a GPU-equipped machine, modified it to preserve the file and run it. On the same console I then run gpu_pkl_to_cpu_pkl.py and I was able to reproduce the error. No difference in environment, pylearn2 version, etc

But your comment made me realize that there is no _check_is_symbolic in pylearn2.space. Previously I just assumed that it exists and never checked.

As it turns out there is a VectorSpace object that has a __reduce_ex__() method inherited from object that returns this funky dictionary:

{'dim': 10, 
'_check_is_symbolic': <function _check_is_symbolic at 0x7ff5a9c2d1b8>, 
'validate_callbacks': [],
'sparse': False,
'_dtype': 'float32',
'_check_is_numeric': <function _check_is_numeric at 0x7ff5a9c2d140>,
'np_validate_callbacks': []}

After that pickle tries to save that dictionary, attempts to locate _check_is_symbolic in pylearn2.space and fails (obj is VectorSpace, rv +/- is the dictionary):

> /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(331)save()
-> self.save_reduce(obj=obj, *rv)
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(419)save_reduce()
-> save(state)
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(286)save()
-> f(self, obj) # Call unbound method with explicit self
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(649)save_dict()
-> self._batch_setitems(obj.iteritems())
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(663)_batch_setitems()
-> save(v)
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(286)save()
-> f(self, obj) # Call unbound method with explicit self
  /home/ubuntu/devel/anaconda/lib/python2.7/pickle.py(748)save_global()
-> (obj, module, name))

A newly constructed VectorSpace's __reduce_ex__() returns, amongst other things:

{'dim': 1, 
'_dtype': 'float32',
'validate_callbacks': [],
'sparse': False,
'np_validate_callbacks': []}

Both _check_is_symbolic() and _check_is_numeric are tagged as @staticmethod.

I have no elegant idea about how to fix the things.

These changes are now part of pyl2extra.
Feel free to update the script from there if you see fit.
Closing this issue down as it has been inactive for a while.