- feature engineering
- feature list
- text length
- word count
- sentence count
- quoted sentence count
- consec tive_dots count
- word overlap count
- spell miss count, SpellChecker
- text embedding
- deberta v3
- target encoding of content and wording
- groupby word count (rounded to 5)
- sentence count (clip between 1 and 20)
- feature list
- cv strategy
- Group K-Fold
- k=4
- group=prompt_id
- Group K-Fold
- model
- first statge model
- fine tuned deberta v3 base (cv=0.55)
- inputs: prompt_question, text
- fine tuned deberta v3 base (cv=0.55)
- second stage models
- XGBoost (cv=0.5222257984694146)
- inputs: first stage output, text stats feature
- LightGBM (cv=0.5239820553996042)
- inputs: first stage output, text stats feature
- XGBoost (cv=0.5222257984694146)
- ensemble
- simple average of second stage models.
- first statge model
EXP_ID | Local CV | Public LB | Note |
---|---|---|---|
1 | 0.6687954845101823 | 0.599 | rf with simple text feature |
2 | 0.5148155805419965 | - | add feature of debertav3 text embeddings |
3 | 0.4903529269444470 | 0.509 | change model from rf to xgb |
4 | 0.4899955738087213 | - | add featrue of debertav3 prompt embeddings |
5 | 0.4785185756657641 | - | add feature of overlap word and co-occur words |
6 | 0.4759433370779221 | - | add feature of tri-gram co-occur words |
7 | 0.4737618975123431 | - | change xgb n_estimatoers param 500 to 800 |
8 | 0.4744999729694380 | 0.479 | rm featrue of debertav3 prompt embeddings |
9 | 0.5576348008005831 | 0.478 | change kfold to group kfold |
10 | 0.5572727558437666 | - | add feature of spell_miss_count |
11 | 0.5560561772865491 | 0.479 | add feature of quotes_count |
12 | 0.5451717268584183 | 0.559 | only finetuned deberta base |
13 | 0.5168956770838019 | 0.491 | stacking xgb on deberta |
14 | 0.5162055570275468 | - | ensenble lgbm |
15 | 0.5148750859363870 | 0.465 | add feature of target encoding |
16 | 0.5157331434893387 | 0.467 | refactoring create feature process |
17 | 0.5114196777076987 | 0.470 | add feature of wv simirality of prompt text and text |
18 | 0.5097619287866334 | - | add feature of some text stats |
19 | 0.509488368861999 | 0.467 | add feature of glove vec simirality |
- fine tuned roberta base (cv=0.5809940545327481) as first stage model
- inputs: prompt_question, text
- text averaged word2vec
- average and median of word length
- text length per sentence
- training deberta each other prompt and summary