reczoo/BARS

click history sequence feature in taobao dataset ?

Closed this issue · 1 comments

Hello, in taobao dataset it contain raw behavior dataset。 According to L36 - L56 https://github.com/openbenchmark/BARS/blob/master/ctr_prediction/datasets/Taobao/Taobao_x1/split_taobao_x1.py#L36, it create click history sequence from raw_sample dataset, not use the raw behavior dataset。

xpai commented

Sorry for the late reply. We use the raw_sample data to create historical sequences since we would like to use item IDs in the sequence. The raw behavior data have no such information. But after some experiments, we found that creating behavior sequences in such a way for Taobao data does not work for DIN, i.e., there is no gain when using target attention.
Recently, we have refined the data preprocessing and denote it as "taobaoad_x1". Please check the code for your reference:
https://github.com/openbenchmark/BARS/blob/master/datasets/Taobao/TaobaoAd_x1/convert_taobaoad_x1.py

In this version, we obtain good performance gains for DIN.
DCN:
[Metrics] gAUC: 0.573908 - AUC: 0.648805 - logloss: 0.193040
DIN:
[Metrics] gAUC: 0.576459 - AUC: 0.652399 - logloss: 0.192445