Making sure I understand the data format
alexklibisz opened this issue · 1 comments
alexklibisz commented
Thanks for updating this wrapper and getting it working.
I want to make sure I'm understanding how to format the input data for training. LibFFM repo gives the example:
Click Advertiser Publisher
===== ========== =========
0 Nike CNN
1 ESPN BBC
Here, we have
* 2 fields: Advertiser and Publisher
* 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC
To format this as [[(field, index, value), ...], ...]
, would this be correct:
# Fields: 0 = Advertiser, 1 = Publisher
# Indexes: (0,0) = Advertiser-Nike, (0,1) = Advertiser-ESPN, (1,0) = Publisher-CNN, (1,1) = Publisher-BBC
# Values: 0 = absent, 1 = present (i.e. one-hot encoding).
X = [[(0, 0, 1), (1,0,1)],
[(0, 1, 1), (1,1,1)]]
Thanks again
alexeygrigorev commented
Yes you got it right