How to use it to evaluate on other datasets and for other embedding algorithms?
herdonyan opened this issue · 2 comments
Should I change the dataset into a csv file or excel file or other formats?
Which lines or files should I change if I want to use a new dataset and a new embedding algorithms for evaluation while keeping the awesome hyper-parameter tuning mechanisms?
How to add new datasets
First, download and unpack the data as described here. You will see the new data/
directory in the repository. In the directory, there are datasets used in the paper.
Then, you have to add your dataset in the data/
directory following the format of other datasets. Let's say your dataset's name is iris
. Then you should use np.save and create the directory data/iris
with the following content:
- (only if the dataset has numerical features)
X_num_train.npy
,X_num_val.npy
,X_num_test.npy
(numpy arrays of float32) - (only if the dataset has categorical features)
X_cat_train.npy
,X_cat_train.npy
,X_cat_train.npy
(numpy arrays of strings) y_train.npy
,y_val.npy
,y_test.npy
(numpy arrays of {float32 for regression, int64 for classification}); for classification, the classes must be fromrange(n_classes)
info.json
: see this file for other datasets to see its content
Let's say you want to run the tuning & evaluation pipeline for MLP on your dataset. Then copy any existing config (for example, this one) and change the path inside the config to point to your dataset ("data/iris"
instead of "data/california"
).
Full script:
export CUDA_VISIBLE_DEVICES="0"
mkdir exp/mlp/iris
cp exp/mlp/california/0_tuning.toml exp/mlp/iris/0_tuning.toml
<edit the new config as described above>
python bin/tune.py exp/mlp/iris/0_tuning.toml
python bin/evaluate.py exp/mlp/iris/0_tuning 15
python bin/ensemble.py exp/mlp/iris/0_evaluation
How to add new embedding algorithms
I don't understand the question :) You can use bin/train4.py
as a starting point.
Feel free to reopen the issue if needed.