This repository contains data pickels for multilabel classification dataset for easy research.
I like to think of this as my mini-contribution. Each directory contains a single datset, with the respective pickles and count.txt
which give the dimensions of features and labels in the order.
dataset_name
|____________ dataset_name-train-features.pkl
|____________ dataset_name-test-features.pkl
|____________ dataset_name-train-labels.pkl
|____________ dataset_name-test-labels.pkl
import numpy as np
def get_data(path, noise=False):
data = np.load(path)
if noise == True :
data = data + np.random.normal(0, 0.001, data.shape)
return data
x_train = get_data("./datset_name/dataset_name-train-features.pkl")
To convert your standard arff files to numpy pickles, which are easier to use and faster to process, use the script I have included along with the datasets.
Download the dataset and put it in this folder. Create a count.txt
in the directory and put the dimensions in the following format :
features_dim
labels_dim
pip3 install liac-arff
python3 to_numpy --dataset dataset_name