re-implementation of DeepFM with pytorch 1.0
This implementation requires model to receive data batches in the following format:
- label: target of each sample in the dataset (1/0 for classification)
- idxs: [[ind1_1, ind1_2, ...], [ind2_1, ind2_2, ...], ..., [indi_1, indi_2, ..., indi_j, ...], ...]
- indi_j is the feature index of feature field j of sample i in the dataset
- vals: [[val1_1, val1_2, ...], [val2_1, val2_2, ...], ..., [vali_1, vali_2, ..., vali_j, ...], ...]
- vali_j is the feature value of feature field j of sample i in the dataset
- vali_j can be either binary (1/0, for binary/categorical features) or float (e.g., 12.34, for numerical features)
The main.py
has already provided methods to process numeric and categorical features, so the only thing you need to do is changing data_path
and hard coding the column names to tell the program which columns you want to re-format.
e.g.
dummy_cols
is the list of template codes to show result index;category_cols
is the list of categorical column names- Confirm that there is no other columns except columns mentioned above in your own dataframe, then the code will automatically extract the
numeric_cols
data = pd.read_csv('./temp_data.csv').reset_index(drop=True)
category_cols = ['CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY']
dummy_cols = ['SK_ID_CURR']
target_col = 'TARGET'
numeric_cols = list(set(data.columns) - set(category_cols + dummy_cols + [target_col]))
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He.