allegro/allRank

the models folder and predictions folder are empty!

Opened this issue ยท 5 comments

Hello there,
Thank you for the package.
I could successfully make a run of it but I am wondering how I can see the prediction results on a test set. Assuming I trained a model, now I want to load and see the actual predictions on a test set as well as the metrics.
Now, in the output folder, the models folder and predictions folder are empty! However, I have the logfile that shows a successful run.

I can explore inside the code but I thought you may know the reason before.

Thank you for your message. The model should be saved to the model.pkl file. The models and predictions directories are remains of our internal allRank fork, where we saved the model after each iteration and also dumped the dataset predictions for the final model. We will add these files in the following allRank releases.

Thank you for your message. The model should be saved to the model.pkl file. The models and predictions directories are remains of our internal allRank fork, where we saved the model after each iteration and also dumped the dataset predictions for the final model. We will add these files in the following allRank releases.

Thank you for your reply. Yes, I can see the model.pkl.

Also, can you help me with how I can use this saved model for prediction on a test set? I mean I want to see the actual final prediction of the model.

Hello there,
I write a piece of code to write the output prediction for a test set. I did some minor changes to the LibSVMDataset in order to keep the query_ids and their order in the test set which are required when we pair ground truth and predicted order from the test set.


class LibSVMDataset(Dataset):
...
    def __init__(self, X, y, query_ids, transform=None):
 ...
       X = X.toarray()

        # in order to keep the order of qids as we read the input file 
        self.query_ids = Counter(query_ids)
        groups = np.cumsum(list(self.query_ids.values()))

        self.X_by_qid = np.split(X, groups)[:-1]
        self.y_by_qid = np.split(y, groups)[:-1]

def test():
    topn = 10
    test_path = f'../'

    #creating datset and dataloader instances
    test_ds = load_libsvm_dataset_role('test', test_path, topn )
    _, test_dl = create_data_loaders(test_ds, test_ds, num_workers=1, batch_size=test_ds.shape[0])
    
    n_features = test_ds.shape[-1]
    assert all_equal([n_features]), f"Last dimensions of datasets must match but got {n_features}"

    #loading trained model
    config = Config.from_json(f'{test_path}/used_config.json')
    model = make_model(n_features=n_features, **asdict(config.model, recurse=False))
    model.load_state_dict(load_state_dict_from_file(f'{test_path}/model.pkl', dev))
    
    x_, y_ = __rank_slates(test_dl, model)

    with open(f'{path}/test{foldidx}.pred.csv', 'w') as f:
            f.write(f'qid,eid,pred_score,true_sorted_by_pred\n')
            for i, (qid, count) in enumerate(test_dl.dataset.query_ids.items()):
                for j in range(count):
                    f.write(f'{qid},{int(x_[i, j, -1])},{topn - j},{int(y_[i, j])}\n')
        
    #again do the prediction for metrics calculation. I couldn't find a better way to do it reusing *y_ * in the above
    results = compute_metrics(config.metrics, model, test_dl, dev)
     

Thank you! The code looks OK. We will be working on a similar script after we release the reproducibility guide.

And I can confirm there is no straightforward way at the moment to calculate the metrics from the config, reusing your reranked x & y.

Is there any update on this? I would also like to see the prediction results of the sets.
Thanks in advance :)