reczoo/BARS

download csv and use h5

Closed this issue · 2 comments

when I download Avazu dataset in kaggle, the format of Avazu is csv
but when I run the code and the format of DeepFM_avazu_x4_tuner_config_01.yaml is h5
how can I transform csv to h5?
image
image

Pls refer to this issue: #5

You could also modify the dataset_config to the following, which is equivalent:

dataset_config:
  avazu_x4:
      data_root: ../data/Avazu/
      data_format: csv
      train_data: ../data/Avazu/Avazu_x4/train.csv
      valid_data: ../data/Avazu/Avazu_x4/valid.csv
      test_data: ../data/Avazu/Avazu_x4/test.csv
      min_categr_count: 2
      feature_cols:
          - {name: id, active: False, dtype: str, type: categorical}
          - {name: hour, active: True, dtype: str, type: categorical, preprocess: convert_hour}
          - {name: [C1,banner_pos,site_id,site_domain,site_category,app_id,app_domain,app_category,device_id,
                    device_ip,device_model,device_type,device_conn_type,C14,C15,C16,C17,C18,C19,C20,C21], 
             active: True, dtype: str, type: categorical}
          - {name: weekday, active: True, dtype: str, type: categorical, preprocess: convert_weekday}
          - {name: weekend, active: True, dtype: str, type: categorical, preprocess: convert_weekend}
      label_col: {name: click, dtype: float}

Anyway, pls make sure the md5sum values are consistent to ours for reproduction.