This is simply a list of files that are used in ML and contain pickles. This is intended to be used when creating detections of files that could be used to take over any system openning thanks to fickling and pickle_wrapper. For the most part these extensions are purely conventions, and people can use whatever they feel like, so anytime you see the library used.
TODO: add samples of these file's to test injection and detection techniques. Feel free to provide merge requests with some, however please try to keep them smaller than normal, because I'll have to manually review the opperations of the pickle to ensure it's not truely malicious.
PyTorch models and state dicts are commonly saved using torch.save with the extension .pt and .pth. And pytorch checkpoints are commonly saved with the extension .ckpt.
Pytorch files created after version 1.6 are a zipfile-based format. These still contain a pickle, however it's not just a strait pickle. torch.load still parses strait pickle files and _use_new_zipfile_serialization=False
can be used to create a file in the old format that is just ap pickle.
numpy.save
and numpy.savez
both fallback to using pickles if you save something that they can't encode otherwise.
In my limitted tests, save
appends some numpy meta data to the beginning of the file so that the linux file
command will say it's a NumPy array, however if you partition the file contents on \n
everything after the first new line is a standard pickle. The save
command does support allow_pickle=False
to ensure this doesn't happen, and numpy.load
by default has allow_pickle=False
.
savez
simply creates a zip folder as the name implies that allows multiple npy files to be combined into one. While this is incredibly useful, there is no option for allow_pickle
on savez, so it's extra important to make sure to not change numpy.load
s default to allow_pickle=False
.
Not entirely sure what joblib.dump
does differently from pickle, but the end result is a pickle.
A pickle's a pickle
Some libraries tend to have content saved using pickle.save directly, bellow is a list of ML libraries which this applies to according to stack overflow results.
- NLTK
Archive that contains other formats and configs. The contained formats include some that themselves are pickles or contain pickles.