How to use it to evaluate on other datasets and for other embedding algorithms?

Question

How to use it to evaluate on other datasets and for other embedding algorithms?

herdonyan opened this issue a year ago · 2 comments

Should I change the dataset into a csv file or excel file or other formats?
Which lines or files should I change if I want to use a new dataset and a new embedding algorithms for evaluation while keeping the awesome hyper-parameter tuning mechanisms?

Answer 1 · 2023-05-10T10:40:04.000Z

How to add new datasets

First, download and unpack the data as described here. You will see the new data/ directory in the repository. In the directory, there are datasets used in the paper.

Then, you have to add your dataset in the data/ directory following the format of other datasets. Let's say your dataset's name is iris. Then you should use np.save and create the directory data/iris with the following content:

(only if the dataset has numerical features) X_num_train.npy, X_num_val.npy, X_num_test.npy (numpy arrays of float32)
(only if the dataset has categorical features) X_cat_train.npy, X_cat_train.npy, X_cat_train.npy (numpy arrays of strings)
y_train.npy, y_val.npy, y_test.npy (numpy arrays of {float32 for regression, int64 for classification}); for classification, the classes must be from range(n_classes)
info.json: see this file for other datasets to see its content

Let's say you want to run the tuning & evaluation pipeline for MLP on your dataset. Then copy any existing config (for example, this one) and change the path inside the config to point to your dataset ("data/iris" instead of "data/california").

Full script:

export CUDA_VISIBLE_DEVICES="0"
mkdir exp/mlp/iris
cp exp/mlp/california/0_tuning.toml exp/mlp/iris/0_tuning.toml
<edit the new config as described above>
python bin/tune.py exp/mlp/iris/0_tuning.toml
python bin/evaluate.py exp/mlp/iris/0_tuning 15
python bin/ensemble.py exp/mlp/iris/0_evaluation

How to add new embedding algorithms

I don't understand the question :) You can use bin/train4.py as a starting point.

Answer 2 · 2023-05-16T13:15:58.000Z

Feel free to reopen the issue if needed.