/talkingdata

Kaggle competition - talking data

Primary LanguageJupyter Notebook

Kaggle-TalkingData

The aim of this project is to predict the demographics of a user (gender and age) based on their app download and usage behaviours. The data sets at our disposal are the following:

  • gender_age_train.csv, gender_age_test.csv: the training and test set

    • group: this is the target variable you are going to predict
  • events.csv, app_events.csv: when a user uses TalkingData SDK (the app used for the study), the event gets logged in this data. Each event has an event id, location (lat/long), and the event corresponds to a list of apps in app_events.

    • timestamp: when the user is using an app with TalkingData SDK
  • app_labels.csv: apps and their labels, the label_id's can be used to join with label_categories

  • label_categories.csv: apps' labels and their categories in text

  • phone_brand_device_model.csv: device ids, brand, and models

  • phone_brand: note that the brands are in Chinese (translation courtesy of user fromandto)

      三星 samsung
    
      天语 Ktouch
    
      海信 hisense
    
      联想 lenovo
    
      欧比 obi
    
      爱派尔 ipair
    
      努比亚 nubia
    
      优米 youmi
    
      朵唯 dowe
    
      黑米 heymi
    
      锤子 hammer
    
      酷比魔方 koobee
    
      美图 meitu
    
      尼比鲁 nibilu
    
      一加 oneplus
    
      优购 yougo
    
      诺基亚 nokia
    
      糖葫芦 candy
    
      **移动 ccmc
    
      语信 yuxin
    
      基伍 kiwu
    
      青橙 greeno
    
      华硕 asus
    
      夏新 panosonic
    
      维图 weitu
    
      艾优尼 aiyouni
    
      摩托罗拉 moto
    
      乡米 xiangmi
    
      米奇 micky
    
      大可乐 bigcola
    
      沃普丰 wpf
    
      神舟 hasse
    
      摩乐 mole
    
      飞秒 fs
    
      米歌 mige
    
      富可视 fks
    
      德赛 desci
    
      梦米 mengmi
    
      乐视 lshi
    
      小杨树 smallt
    
      纽曼 newman
    
      邦华 banghua
    
      E派 epai
    
      易派 epai
    
      普耐尔 pner
    
      欧新 ouxin
    
      西米 ximi
    
      海尔 haier
    
      波导 bodao
    
      糯米 nuomi
    
      唯米 weimi
    
      酷珀 kupo
    
      谷歌 google
    
      昂达 ada
    
      聆韵 lingyun
      sample_submission.csv - a sample submission file in the correct format