A repo to preprocess the Purchase100 dataset extracted from Kaggle: Acquire Valued Shoppers Challenge
The authenticity of the downloaded files can be checked with the following md5 hashes
- purchase100.npz :
0d7538b9806e7ee622e1a252585e7768
- Download the
transactions.csv.gz
file from https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data - Run
preprocess_dataset.py <path_to_transactions.csv.gz>
to generatepurchase100.npz
- You can import them in your code using numpy.load function
data = np.load('./purchase100.npz')
features = data['features']
labels = data['labels']
This work is tested with Python 3.8.5.
The requirements.txt file is automatically generated with pipreqs.
The code in this repo is based on the preprocessing scripts given in https://github.com/bargavj/EvaluatingDPML