pickle format of models not readable under Windows
flokadillo opened this issue · 13 comments
On a Windows system, loading a pickled model file
with open('C:/Users/krf/dev/tools/madmom/madmom/models/chords/2016/chords_cnncrf.pkl', 'rb') as f:
a = pickle.load(f)
results in the error:
Traceback (most recent call last):
File "<ipython-input-20-4ec5f8b68ed5>", line 2, in <module>
a = pickle.load(f)
File "C:\Program Files\Anaconda2\lib\pickle.py", line 1384, in load
return Unpickler(file).load()
File "C:\Program Files\Anaconda2\lib\pickle.py", line 864, in load
dispatch[key](self)
File "C:\Program Files\Anaconda2\lib\pickle.py", line 1096, in load_global
klass = self.find_class(module, name)
File "C:\Program Files\Anaconda2\lib\pickle.py", line 1130, in find_class
__import__(module)
ImportError: No module named copy_reg
(See also an entry at stackoverflow).
Apparently the model file was saved in text format.
When I remove the 'b' option when opening the file and execute
with open('C:/Users/krf/dev/tools/madmom/madmom/models/chords/2016/chords_cnncrf.pkl', 'r') as f:
a = pickle.load(f)
everything works fine.
The problem is that in processors.py:Processor.load line 58, pickle files are always loaded from binary files. We should therefore make sure that all model files are exported to the binary format as well. This is apparently not true for:
chords/2016/chords_cnncrf.pkl
chords/2016/chords_dccrf.pkl
It is true (is binary) for:
chords/2016/chords_cnnfeat.pkl
Ok, I see. I think we have 2 options:
- adjust
madmom.processors
to be able to load non-binary pickles - convert the affected models to binary format and make sure that models are always save as binary.
Any thoughts?
Looking at madmom PR #224 I'm not sure if it isn't easier to just update the affected models and save them in binary format...
I investigated how I created the model files in first place, and I think this might be an even more specific problem than I (or, we) thought. My code to save CRF model files does not use the dump
function of the Processor
, but directly calls pickle.dump(crf, open(args['<mm_model>'], 'wb'))
. Note that it uses the binary format!. What it does not specify is the protocol, and it thus uses a different one than if you would save the processor using the provided dump
function.
How can I be sure that I did not change the saving code after creating the models? Unfortunately, I cannot, but it seems unlikely:
- The code to create the CRF models has only one (initial) commit from the 7th of July, 15:56.
- The (broken)
chords_cnncrf.pkl
model was commited on the 7th of July, 14:38. - I don't see why I would have switched from 'w' to 'wb' in the meantime.
Since I do not have a Windows machine available, I cannot test it, maybe @flokadillo can check if the attached files work or not (mdls.zip)
mm_test_crf_direct.pkl
: CRF saved withProcessor.dump
mm_test_crf.pkl
: CRF saved with the code above.mm_test_crf_txt.pkl
: CRF saved with the code above, but using 'w' instead of 'wb'
The latter two are identical, i.e. it doesn't matter whether you use w
or wb
, they are always saved in text (non-binary) form. The first one is binary, and thus also a third in size.
If mm_test_crf_direct.pkl
works on Windows (which I assume), I'd vote for updating the models.
I checked the three files that @fdlm provided on a Windows 8 system. All three of them can be opened using the binary flag (with open('file.pkl', 'rb')
). All except the first (mm_test_crf_direct.pkl
) can be opened without the binary flag (with open('file.pkl', 'r')
) as well. With the first file, I get a KeyError: ´\x1d´
.
This makes sense, because the mm_test_crf_direct.pkl
was saved in saved in binary mode. The question is now: why does loading the other models work in Win8, while loading the ones in the repo does not?
Anyways, I "converted" the CRF models in the master branch by calling
m = pickle.load(open(filename))
m.dump(filename)
and attached them here, so we can check for sure if they work. In the attachment, you can also find the current models in the master branch which should not work according to this issue. Would you mind checking this, @flokadillo ?
Hmm strange. Apparently we have two different versions of the model files here.
The files that you provided do indeed all work with the binary flag. However, they are different to the ones that were in my madmom/models
folder.
chords_cnncrf.pkl
in mymadmom/models
folder (md5sum 2a0c427df5da8aa7e5f026924d93dc3e): cannot be loaded with the binary flag (it also works without the flag, though). This file is from Nov 16, 2016.chords_cnncrf.pkl
as provided by your attachment is the most recent version ofmadmom_models
(md5sum 45e9ff594f07493ced7d0803ff7df58c): and can be loaded with and without the binary flag
Ok, I am checking where the models version come frome that I have on my computer (madmom was installed on Nov 16). I have no clue why I don't have the most current version of the file, even after executing git pull
in the madmom/models
folder...
Can you attach or send me the file you have on your computer?
I am just re-installing madmom to check with the supplied models. Unfortunately, this is not too easy with Windows and may take a while. Here is the old , non-working model:
This is getting fun!
The difference between the file you provided and the one I sent out is the handling of new lines between Windows and Linux: @flokadillo's has Windows-style newlines, mine has Unix-style newlines. So, let's summarise and look at the MD5:
chords_cnncrf.pkl
as provided by @fdlm: 45e9ff594f07493ced7d0803ff7df58cchords_cnncrf.pkl
as provided by @flokadillo: 2a0c427df5da8aa7e5f026924d93dc3echords_cnncrf.pkl
as provided by @flokadillo run throughdos2unix
: 45e9ff594f07493ced7d0803ff7df58c
Wild guess: did git replace the new-lines in the models that are stored as text pickles? (see https://help.github.com/articles/dealing-with-line-endings/)?
Ok, this sounds reasonable. But it looks like that saving the models in binary mode instead of text format solves the problem (no matter if it originates from git's behaviour or somewhere else). This would also make the models smaller.
Or did I miss something?
Yes, I guess we should replace the affected model files and that's it.
Agreed! Good detective work, @fdlm!