Kaggle-TalkingData

The aim of this project is to predict the demographics of a user (gender and age) based on their app download and usage behaviours. The data sets at our disposal are the following:

gender_age_train.csv, gender_age_test.csv: the training and test set
- group: this is the target variable you are going to predict
events.csv, app_events.csv: when a user uses TalkingData SDK (the app used for the study), the event gets logged in this data. Each event has an event id, location (lat/long), and the event corresponds to a list of apps in app_events.
- timestamp: when the user is using an app with TalkingData SDK
app_labels.csv: apps and their labels, the label_id's can be used to join with label_categories
label_categories.csv: apps' labels and their categories in text
phone_brand_device_model.csv: device ids, brand, and models

phone_brand: note that the brands are in Chinese (translation courtesy of user fromandto)

  三星 samsung

  天语 Ktouch

  海信 hisense

  联想 lenovo

  欧比 obi

  爱派尔 ipair

  努比亚 nubia

  优米 youmi

  朵唯 dowe

  黑米 heymi

  锤子 hammer

  酷比魔方 koobee

  美图 meitu

  尼比鲁 nibilu

  一加 oneplus

  优购 yougo

  诺基亚 nokia

  糖葫芦 candy

  **移动 ccmc

  语信 yuxin

  基伍 kiwu

  青橙 greeno

  华硕 asus

  夏新 panosonic

  维图 weitu

  艾优尼 aiyouni

  摩托罗拉 moto

  乡米 xiangmi

  米奇 micky

  大可乐 bigcola

  沃普丰 wpf

  神舟 hasse

  摩乐 mole

  飞秒 fs

  米歌 mige

  富可视 fks

  德赛 desci

  梦米 mengmi

  乐视 lshi

  小杨树 smallt

  纽曼 newman

  邦华 banghua

  E派 epai

  易派 epai

  普耐尔 pner

  欧新 ouxin

  西米 ximi

  海尔 haier

  波导 bodao

  糯米 nuomi

  唯米 weimi

  酷珀 kupo

  谷歌 google

  昂达 ada

  聆韵 lingyun
  sample_submission.csv - a sample submission file in the correct format

katherineedgley/talkingdata

Kaggle-TalkingData