This library contains code for handling the DataSet File Format (DSFF) based on the XSLX format and for converting it to ARFF (for use with the Weka framework), CSV or a FilelessDataset structure (from the Packing Box).
pip install --user dsff
Creating a DSFF from a FilelessDataset
>>> import dsff
>>> with dsff.DSFF() as f:
f.write("/path/to/my-dataset") # folder of a FilelessDataset (containing data.csv, features.json and metadata.json)
f.to_arff() # creates ./my-dataset.arff
f.to_csv() # creates ./my-dataset.csv
# while leaving the context, ./my-dataset.dsff is created
Creating a FilelessDataset from a DSFF
>>> import dsff
>>> with dsff.DSFF("/path/to/my-dataset.dsff") as f:
f.to_dataset() # creates ./[dsff-title] with data.csv, features.json and metadata.json
You may also like these:
- Awesome Executable Packing: A curated list of awesome resources related to executable packing.
- Bintropy: Analysis tool for estimating the likelihood that a binary contains compressed or encrypted bytes (inspired from this paper).
- Dataset of packed ELF files: Dataset of ELF samples packed with many different packers.
- Dataset of packed PE files: Dataset of PE samples packed with many different packers (fork of this repository).
- Docker Packing Box: Docker image gathering packers and tools for making datasets of packed executables.
- PEiD: Python implementation of the well-known Packed Executable iDentifier (PEiD).
- PyPackerDetect: Packing detection tool for PE files (fork of this repository).
- REMINDer: Packing detector using a simple heuristic (inspired from this paper).