/predict-delinquency

๐Ÿ†์‹ ์šฉ์นด๋“œ ์‚ฌ์šฉ์ž ์—ฐ์ฒด ์˜ˆ์ธก AI ๊ฒฝ์ง„๋Œ€ํšŒ 2๋“ฑ ์†”๋ฃจ์…˜๐Ÿ†

Primary LanguagePythonApache License 2.0Apache-2.0

predict-delinquency

Code style: black

Feature Engineering

  • income_type: ๊ธฐ๋ณธ ๋ณ€์ˆ˜
  • edu_type: ๊ธฐ๋ณธ ๋ณ€์ˆ˜
  • family_type: ๊ธฐ๋ณธ ๋ณ€์ˆ˜
  • house_type: ๊ธฐ๋ณธ ๋ณ€์ˆ˜
  • occyp_type: ๊ธฐ๋ณธ ๋ณ€์ˆ˜
  • income_total: numericํ•œ ๋ณ€์ˆ˜๋ฅผ category๋กœ ๋ฐ”๊ฟˆ
  • begin_month: ์ค‘๋ณต ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ต์‹ฌ ๋ณ€์ˆ˜
  • DAYS_BIRTH_month: DAYS_BIRTH์˜ ๋‹ฌ
  • DAYS_BIRTH_week: DAYS_BIRTH์˜ ์ฃผ
  • Age: DAYS_BIRTH์˜ ๋…„๋„
  • DAYS_EMPLOYED_month: DAYS_EMPLOYED์˜ ๋‹ฌ
  • DAYS_EMPLOYED_week: DAYS_EMPLOYED์˜ ์ฃผ
  • EMPLOYED: DAYS_EMPLOYED์˜ ๋…„
  • before_EMPLOYED: DAYS_BIRTH์™€ DAYS_EMPLOYED์˜ ์ฐจ
  • before_EMPLOYED_month: ๊ณ ์šฉ๋˜๊ธฐ ์ „์˜ ๋‹ฌ
  • before_EMPLOYED_week: ๊ณ ์šฉ ๋˜๊ธฐ ์ „์˜ ์ฃผ
  • gender_car_reality: ์„ฑ๋ณ„, ์ฐจ, ๋ถ€๋™์‚ฐ ๋ณ€์ˆ˜๋ฅผ ํ•ฉ์นจ

ํ•ต์‹ฌ Model

  • category feature์˜ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์ˆ˜์ ์œผ๋กœ ์ค‘์š”
  • CatBoost๋ฅผ ํ™œ์šฉํ•˜์—ฌ category feature๋“ค์„ ์ง€์ •ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’ํž˜
  • LightGBM, XGBoost์™€ ๋น„๊ต ํ›„ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„

Cross Validation ์ „๋žต

  • Stratified K-Fold: ๋‹ค์ค‘ ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ ์ž์ฃผ ์“ฐ์ด๋Š” ๊ธฐ๋ฒ• labeling์˜ sample์„ ์ž˜ ๋งž์ถฐ์„œ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋„๋ก ํ•จ
  • 10-fold๋กœ ์ง„ํ–‰ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋†’ํž˜

Hyperparameter ์ „๋žต

  • Bayesian TPE ๋ฐฉ์‹์œผ๋กœ ๋น ๋ฅด๊ฒŒ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ -> AutoML๋กœ ์ ‘๊ทผ
  • ์ตœ๋‹จ์‹œ๊ฐ„์— ์ตœ๊ณ ํšจ์œจ์ด ๋‚˜์˜ค๊ฒŒ๋” ํ•จ
  • CatBoost์˜ ๊ฒฝ์šฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š์•˜์œผ๋‚˜ ์ข€ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ Bayesian TPE๋ฐฉ์‹ ์‚ฌ์šฉ
  • Lightgbm, XGBoost๋„ Bayesian TPE๋ฐฉ์‹ ์‚ฌ์šฉ
  • RandomForest์™€ TabNet ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ์ง์ ‘ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ํ•จ

Ensemble Model

  • Stacking Ensemble์„ ํ†ตํ•ด์„œ Neural Network๊ฐ€ ํ™•๋ฅ ๊ฐ’์„ ์ž˜ ํ•™์Šตํ•˜๋Š” ๋ชจ๋ธ ๊ตฌ์ถ•

Model Architecture

image

Benchmark

model OOF(10-fold) Public LB
LightGBM 0.68714 0.68591
XGBoost 0.68901 0.68900
RandomForest 0.69137 0.69296
TabNet 0.80392 0.77971
CatBoost 0.67234 0.67288
Stacking Ensemble 0.67069 0.67048

Paper

์ˆœ์œ„

public 3์œ„ private 2์œ„