ChenglongChen/kaggle-CrowdFlower

Statistical Distance Features for Test data

zhangxiangnick opened this issue · 1 comments

How do you generate the statistical distance features (described in Sect. 3.2.2 of your notes) for test data? There is no median_relevance labels for test data. How could it possible to group the test data by median_relevance?

Group the training data by (query, median_relveance) and compute the statistical distance between each sample of test data and the corresponding group of the same query. You can have a look at the code.