maciejkula/spotlight

[FEATURE REQUEST] batch prediction

amirj opened this issue · 3 comments

amirj commented

I have a dataset contains 5,115,123 training samples and 1,278,781 test samples (train/test split ratio default=0.2).
It takes some minutes to train each model on GPU (100% Utilization) but when I run the following command:
mrr_baseline = mrr_score(model_baseline, dataset_test).mean()
It takes hours! and GPU utilizes for only 20%.
Why?
Do you have any suggestion to do the prediction faster?

Having the same issue on certain models (e.g. bilinear neural network model).

amirj commented

The problem is that model.predict(user_id) for each user is time-consuming check the source code.
@maciejkula Is there any way to get all item predictions for all users faster?
I mean, instead of iterating in a loop and get predictions for each user, do it for all users at one stage.
Current API doesn't let this type of predictions, if user_ids is an array, then it should be matched with item_ids.

This issue is more pronounced for the bilinear models because they do very little computation per user. For this and many other reasons I strongly recommend using the sequence models.

@amirj the easiest solution for you is to write your own batched predict implementation.