jzbontar/mc-cnn

Problem loading model - 'Failed to load function from bytecode:' Error while loading model file

ShreyasSkandan opened this issue · 3 comments

Hi,

So i'm currently trying to load a network model via torch on an Nvidia TX1. When I try to load the model

net = torch.load('modelfile.t7','ascii')

I get the following error:

bytecode-error

The model loads fine on my Ubuntu 14.04 desktop, so I tried loading the same model, converting it to binary and then trying to load the converted file

net = torch.load('modelfile.bin')

But i still get a similar error:

binary_error_model

I've noticed that a few people have had the same errors in the past but most people seem to have been able to get past this by using an 'ascii' version of the model since it's platform independent (?). I seem to have had no luck with that. The other set of individuals who faced this problem were on a 32bit system. But my Nvidia TX1 is currently running on Ubuntu 16.04 (64bit).

For anyone willing to recreate these results:

I installed JetPack (JetPack-L4T-2.3.1-linux-x64.run) and verified that my installation of CUDA 8.0 and OpenCV is functional.

For Torch, I used dusty-nv's installation script
https://github.com/dusty-nv/jetson-reinforcement
The installation script in particular is https://github.com/dusty-nv/jetson-reinforcement/blob/master/CMakePreBuild.sh
It all looks pretty straightforward.

The model file in specific is https://s3.amazonaws.com/mc-cnn/net_kitti_fast_-a_train_all.t7

Any tips on how to fix this problem is gladly appreciated. If anyone has any ideas on how I can tweak the model on my desktop machine to make it work here I'd love to hear it.

Thanks in advance,

Shreyas

Shreyas, I don't have a Jetson nearby, so I'm afraid I can't really help you much. If you find a solution please let me know.

Hi @jzbontar thanks for responding to this. So I looked into it some more and it seems as though only the fast models suffer from this problem. The slow models (I tried kitti & kitti2015) seem to load just fine.

Was there a difference in the way these models were exported? I looked through the code and I noticed both were saved as ascii models but it seems as though the fast models end up as binary files?

Hi,

So I managed to figure out (at least somewhat) what was causing the problem and fix it.

So on taking a closer look at the fast architecture model specification and ASCII model, I noticed what seems to be a pointer to a file/function located on your computer [ @jzbontar ]. That seems to be what is making the ASCII file unreadable on the TX1. For a reason unknown to me, that slips through undetected on my desktop.

Model specification:
My hypothesis is that the accUpdateGradParameters : function : X is the cause of this problem.
torch_model_error
Can also be verified by looking at the ASCII version of the fast architecture model(s)
modelerror

The fix is to either load this model onto a computer that accepts it, remove those parameters and resave to an ASCII file or to just re-train these models from scratch.

Here is what the model specification looks like on a working model file:
torch_model_fixed

If anyone is interested in freshly re-trained models that work on the TX1, I've attached links to them below:
KITTI 2012 Fast Architecture Models

net_kitti_fast_-a_train_all.t7

net_kitti_fast_-a_train_tr.t7

Thanks for the help and I hope this post helps anyone else stuck in the same boat.