/level2_dkt_recsys-level2-recsys-10

level2_dkt_recsys-level2-recsys-10 created by GitHub Classroom

Primary LanguageJupyter Notebook

๐Ÿ“š ์ง€์‹ ์ƒํƒœ ์ถ”๋ก (Deep Knowledge Tracing)

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

1-1. ํ”„๋กœ์ ํŠธ ์ฃผ์ œ

์ง€์‹ ๊ตฌ์„ฑ ์š”์†Œ์™€ ์ง€์‹ ์ƒํƒœ๋ฅผ ์ด์šฉํ•˜์—ฌ, ๋ณ€ํ™”ํ•˜๋Š” ์ง€์‹ ์ƒํƒœ๋ฅผ ์ง€์†์ ์œผ๋กœ ์ถ”์ ํ•˜๋Š” task ์ด๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ํ‘ผ ์ผ๋ จ์˜ ๋ฌธ์ œ๋ฅผ ํ†ตํ•ด ๋‹ค์Œ ๋ฌธํ•ญ์— ๋‚ธ ๋‹ต์ด ์ •๋‹ต์ผ์ง€ ์˜ค๋‹ต์ผ์ง€ ๋งž์ถ”๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

1-2. ํ”„๋กœ์ ํŠธ ๊ธฐ๊ฐ„

2022.11.14 ~ 2022.12.08(4์ฃผ)

1-3. ํ™œ์šฉ ์žฅ๋น„ ๋ฐ ์žฌ๋ฃŒ

  • ๊ฐœ๋ฐœํ™˜๊ฒฝ : VScode, PyTorch, Jupyter, Ubuntu 18.04.5 LTS, GPU Tesla V100-PCIE-32GB
  • ํ˜‘์—… Tool : GitHub, Notion
  • ์‹œ๊ฐํ™” : WandB

1-4. ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ๋„

|-- boosting
|   |-- XGBoptuna.ipynb
|   |-- boosting_baseline.py
|   |-- src
|   |-- train.py
|-- dkt
|   |-- README.md
|   |-- args.py
|   |-- inference.py
|   |-- requirements.txt
|   |-- src
|   |-- sweep.yaml
|   |-- train.py
|   |-- tuning.py
|   |-- wandb_train.py
|-- ensembles
|   |-- ensembles.py
|-- lgbm
|   |-- lgbm.ipynb
|   |-- lgbm_baseline.py
|   |-- lgbm_group_kfold.ipynb
|-- lightgcn
|   |-- README.md
|   |-- config.py
|   |-- inference.py
|   |-- install.sh
|   |-- lightgcn
|   |-- train.py
|-- lightgcn_custom
|   |-- README.md
|   |-- config.py
|   |-- inference.py
|   |-- install.sh
|   |-- lightgcn
|   |-- requirements_lightgcn_custom.txt
|   |-- train.py
  • (1) boosting folder
    • LGBM, XGBoost, CatBoost baseline code
  • (2) dkt folder
    • LSTM ๊ณ„์—ด ๋ชจ๋ธ์˜ baseline code
  • (3) ensembles
    • Weighted, voting, mix ๋ฐฉ์‹์˜ ensemble code
  • (4) lgbm
    • LGBM baseline code
  • (5) lightgcn
    • lightgcn baseline code
  • (6) lightgcn_custom
    • lightgcn + BERT , lightgcn + feature representation code

1-5. ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

  • userID ์‚ฌ์šฉ์ž์˜ ๊ณ ์œ ๋ฒˆํ˜ธ
  • testId ์‹œํ—˜์ง€์˜ ๊ณ ์œ ๋ฒˆํ˜ธ
  • assessmentItemID ๋ฌธํ•ญ์˜ ๊ณ ์œ ๋ฒˆํ˜ธ
  • answerCode ์‚ฌ์šฉ์ž๊ฐ€ ํ•ด๋‹น ๋ฌธํ•ญ์„ ๋งž์ท„๋Š”์ง€ ์—ฌ๋ถ€์— ๋Œ€ํ•œ ์ด์ง„ ๋ฐ์ดํ„ฐ
  • Timestamp ์‚ฌ์šฉ์ž๊ฐ€ ํ•ด๋‹น๋ฌธํ•ญ์„ ํ’€๊ธฐ ์‹œ์ž‘ํ•œ ์‹œ์ 
  • KnowledgeTag ๋ฌธํ•ญ ๋‹น ํ•˜๋‚˜์”ฉ ๋ฐฐ์ •๋˜๋Š” ํƒœ๊ทธ

1-6. Metric

  • AUROC(Area Under the ROC curve)์™€ Accuracy

2. ํ”„๋กœ์ ํŠธ ํŒ€ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

๊ตฌํ˜œ์ธ ๊ถŒ์€์ฑ„ ๋ฐ•๊ฑด์˜ ์žฅํ˜„์šฐ ์ •ํ˜„ํ˜ธ ํ—ˆ์œ ์ง„
* ๋ฐ์ดํ„ฐ EDA
* BERT ๋ชจ๋ธ ์ง„ํ–‰
* ๋ฐ์ดํ„ฐ EDA
* XGB ๋ชจ๋ธ ์ง„ํ–‰
* ๋ฐ์ดํ„ฐ EDA
* Last Query ๋ชจ๋ธ ์ง„ํ–‰
* ๋ฐ์ดํ„ฐ EDA
* LSTM+Attention ๋ชจ๋ธ ์ง„ํ–‰
* ๋ฐ์ดํ„ฐ EDA
* LightGBM ๋ชจ๋ธ ์ง„ํ–‰
* ๋ฐ์ดํ„ฐ EDA
* LightGCN ๋ชจ๋ธ ์ง„ํ–‰

3. ํ”„๋กœ์ ํŠธ ์ง„ํ–‰

3-1. ์‚ฌ์ „ ๊ธฐํš

  • 22.11.10(๋ชฉ): DKT ํ”„๋กœ์ ํŠธ ์ „ ์˜คํ”„๋ผ์ธ ๋ฏธํŒ…
  • 22.11.14(์›”): ๋ชจ๋ธ ์„ธ๋ฏธ๋‚˜
  • ์ผ์ • ์ˆ˜๋ฆฝ
    • 22.11.14(์›”) ~ 22.11.20(์ผ) : EDA
    • 22.11.14(์›”) ~ 22.12.02(๊ธˆ) : Feature Engineering
    • 22.11.23(์ˆ˜) ~ 22.12.02(๊ธˆ) : Modeling
    • 22.12.03(ํ† ) ~ 22.12.09(๊ธˆ) : ์ตœ์ ํ™”

3-2. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰

DKT drawio


4. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ

4-1. ๋ชจ๋ธ ์„ฑ๋Šฅ ๋ฐ ๊ฒฐ๊ณผ

โ–  ๊ฒฐ๊ณผ ( AUROC Score ์ƒ์œ„ 4 ๊ฐœ) : Private 7์œ„

LSTMAttention BERT LastQuery XGBoost LightGBM LightGCN
0.7594 0.7791 0.8063 0.8114 0.8210 0.7823
์ตœ์ข… ์„ ํƒ ์—ฌ๋ถ€ ๋ชจ๋ธ (Ensemble ๋น„์œจ) public auroc private auroc
O LightGBM LightGCN LastQuery (0.65, 0.1, 0.25) 0.8253 0.8479
O LightGBM LightGCN LastQuery (0.7, 0.1, 0.2) 0.8252 0.8476
X LightGBM LastQuery XGBoost LightGCNx3 (hard voting)
- LightGCN , LightGCN + feature representation , LightGCN + Bert
0.8094 0.8531
X LightGBM LightGCN LastQuery (0.65, 0.15, 0.2) 0.8232 0.8506

4-2. ๋ชจ๋ธ ๊ฐœ์š”

    1. Transformer ๊ณ„์—ด ๋ชจ๋ธ
      1. LSTM + Attention
      1. BERT
      1. LastQuery
    1. Boosting ๊ณ„์—ด ๋ชจ๋ธ
      1. LightGBM
      1. XGBoost
    1. Graph ๋ชจ๋ธ
      1. LightGBM

4-3. ๋ชจ๋ธ ์„ ์ •

  • ๋ฒ ์ด์Šค๋ผ์ธ ์ฝ”๋“œ
    • LightGBM
      • ๊ธฐ๋ณธ์ ์œผ๋กœ ์ฃผ์–ด์ง„ ์ปฌ๋Ÿผ์ด ๊ต‰์žฅํžˆ ์ ๊ณ  ๋งŒ๋“ค์–ด๋‚ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋”ฐ๋ผ์„œassessmentItemID, testId, KnowledgeTag ๋“ฑ ๋Œ€๋ถ€๋ถ„์ด ๋ฒ”์ฃผํ˜•์œผ๋กœ ์ฃผ์–ด์กŒ์ง€๋งŒ Feature๋กœ ํ†ต๊ณ„๊ฐ’์„ ๋งŽ์ด ์‚ฌ์šฉํ• ๊ฑฐ๋ผ ์˜ˆ์ƒํ•˜์—ฌ CatBoost ์‚ฌ์šฉ์„ ๋ฏธ๋ฃจ๊ธฐ๋กœ ํ–ˆ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ์–‘์ด ์ ์ง€ ์•Š์œผ๋ฏ€๋กœ XGBoost๋ณด๋‹ค LGBM์ด ํšจ์œจ์ ์ด๋ผ ์ƒ๊ฐํ–ˆ๋‹ค.
  • ์ถ”๊ฐ€์ ์ธ ๋ชจ๋ธ ์„ ํƒ
    • LastQuery
      • Riid ๋Œ€ํšŒ์—์„œ 1๋“ฑ์„ ๊ธฐ๋กํ•œ ๋ชจ๋ธ๋กœ, sequence ๊ธธ์ด์— ๋”ฐ๋ผ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ ๋‹ค๋ฅธ transformer ๊ณ„์—ด ๋ชจ๋ธ์— ๋น„ํ•ด feature engineering์ด ์ ๊ฒŒ ํ•„์š”ํ•˜์—ฌ ๋ชจ๋ธ๋ง ์†Œ์š”์‹œ๊ฐ„๊ณผ ์„ฑ๋Šฅ ์ธก๋ฉด์—์„œ ์žฅ์ ์„ ๋ณด์˜€๊ธฐ ๋•Œ๋ฌธ์— ์„ ํƒํ–ˆ๋‹ค.
    • XGBoost
      • LightGBM ๋ชจ๋ธ์ด ์„ฑ๋Šฅ์ด ์ž˜ ๋‚˜์™€ ๋น„์Šทํ•œ CART(Classification and regression tree) ์•™์ƒ๋ธ” ๋ชจ๋ธ์ด๋ฉด์„œ ๋‹ค์–‘ํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ ˆํ•ด ๋ณผ ์ˆ˜ ์žˆ์–ด LightGBM๊ณผ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

4-4. ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„  ๋ฐฉ๋ฒ•

  • Hyperparameter tuning(Wandb, Sweep, Optuna)
  • K-fold
  • Ensemble

5. WrapUp Report

Level_2_DKT_๋žฉ์—…๋ฆฌํฌํŠธ