SetRank doesn't work when the number of input documents varies from training to testing.

Question

SetRank doesn't work when the number of input documents varies from training to testing.

Opened this issue 4 years ago · 1 comments

The current version of SetRank doesn't work when the number of input documents varies. For example,

If you create a SetRank model with the number of input documents as 100 in training, you couldn't use it to rank a test query with 10 candidate documents without explicit paddings.
If you create a SetRank model with the number of input documents as 10 in training, you couldn't use it to rank a test query with 100 candidate documents.

Answer 1 · 2020-08-23T22:40:35.000Z

I think the reason might be some compatibility with TensorFlow. Below is the log file I build SetRank to train on 1% of Yahoo set 1 with cut off as 10 and not cutoff for test and validation. It works fine. Could you please give me details of your implementation? So I can fix it.

Finished reading 190 queries with lists.
Train Rank list size 73
Valid Rank list size 73
Users can only see the top 10 documents for each query in training.
Creating model...
Build DLA
build SetRank
Loss Function is click_weighted_softmax_cross_entropy
Created model with fresh parameters.
Create simluated clicks feed
click_model_json=result_log/Yahoo_trial/eta_1.0/dla_e/Offline_dla_e_eta_1.0_0/pbm_0.1_1.0_4_eta_1.0.json
Create direct label feed with list size 73 with feature size 700
global step 100 learning rate 0.1000 step-time 0.07 loss 4.3165
valid: err_1:0.372 err_3:0.444 err_5:0.464 err_10:0.478 ndcg_1:0.697 ndcg_3:0.699 ndcg_5:0.724 ndcg_10:0.768
Save model, valid ndcg_10:0.768
global step 200 learning rate 0.1000 step-time 0.05 loss 4.2747
valid: err_1:0.382 err_3:0.453 err_5:0.471 err_10:0.486 ndcg_1:0.721 ndcg_3:0.714 ndcg_5:0.733 ndcg_10:0.773
Save model, valid ndcg_10:0.773
global step 300 learning rate 0.1000 step-time 0.05 loss 4.2738
valid: err_1:0.399 err_3:0.465 err_5:0.485 err_10:0.498 ndcg_1:0.757 ndcg_3:0.747 ndcg_5:0.759 ndcg_10:0.795
Save model, valid ndcg_10:0.795
global step 400 learning rate 0.1000 step-time 0.05 loss 4.2766
valid: err_1:0.387 err_3:0.454 err_5:0.475 err_10:0.489 ndcg_1:0.734 ndcg_3:0.728 ndcg_5:0.752 ndcg_10:0.791
Reading data in /home/taoyang/research/datasets/full_yahoo/yahoo_toy/tmp_data/
Read data from /home/taoyang/research/datasets/full_yahoo/yahoo_toy/tmp_data//test in ULTRA format.
Feature reading finish.
List reading finish.
Label reading finish.
Remove 1 invalid queries.
Data reading finish!
Finished reading 190 queries with lists.
Build DLA
learning_rate=0.1,ranker_learning_rate=0.1
Unknown hyperparameter type for ranker_learning_rate
build SetRank

Reading model parameters from result_log/Yahoo_trial/eta_1.0/dla_e/Offline_dla_e_eta_1.0_0/tmp_model/ultra.learning_algorithm.DLA.ckpt-300
Create direct label feed with list size 73 with feature size 700
Testing 100% finished
[Done]
eval: err_1:0.399 err_3:0.466 err_5:0.485 err_10:0.500 ndcg_1:0.754 ndcg_3:0.747 ndcg_5:0.759 ndcg_10:0.795