alexeygrigorev/libffm-python

Making sure I understand the data format

alexklibisz opened this issue · 1 comments

Thanks for updating this wrapper and getting it working.

I want to make sure I'm understanding how to format the input data for training. LibFFM repo gives the example:

Click  Advertiser  Publisher
=====  ==========  =========
    0        Nike        CNN
    1        ESPN        BBC

Here, we have 

    * 2 fields: Advertiser and Publisher

    * 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC

To format this as [[(field, index, value), ...], ...], would this be correct:

# Fields: 0 = Advertiser, 1 = Publisher
# Indexes: (0,0) = Advertiser-Nike, (0,1) = Advertiser-ESPN, (1,0) = Publisher-CNN, (1,1) = Publisher-BBC
# Values: 0 = absent, 1 = present (i.e. one-hot encoding).

X = [[(0, 0, 1), (1,0,1)],
     [(0, 1, 1), (1,1,1)]]

Thanks again

Yes you got it right