In this repo I attempt to reproduce the compression utility described in DeepSqueeze: Deep Semantic Compression for Tabular Data by Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, Ugur Cetintemel from Brown University.
deep_squeeze_demo.mp4
You can read the report.pdf
for info about the original paper, my implementation, my additions and
results.
There are 3 branches in this repo:
master
, used in my demo presentationexperiment
, mainly used to run experiments and producing resultsmixture_of_experts
, since I was not able to achieve better results using the Mixture of Experts architecture, I decided to keep it into a separate branch reducing code complexity of themaster
anddemo
branches
I suggest running DeepSqueeze in the master
branch which is cleaned-up following
the steps below:
- Create a python environment. The DeepSqueeze package was developed in
python3.8
- Install the requirements in
requirements.txt
- Download one of the processed tables (no header, only numerical values).
- Corel dataset
- Intel Berkeley Research Lab Sensor Data
- Monitor dataset, due to its size I have not uploaded the preprocessed version.
You can download it here and preprocess
it using
notebooks/preprocessing.ipynb
- Compress the table with the command:
python compress.py -i path/to/input.csv -o path/to/output/dir/ -e <error_threshold_percentage>
Note that the -e
parameter takes a value between [0, 100]
with suggested values being:
0.5, 1, 5, 10
.
5.Decompress the table with the command:
python decompress.py -i path/to/compressed_tables.zip
This is the full pipeline of DeepSqueeze with some simplification
presented in report.pdf
.