CatBoost for Apache Spark AUC eval metric not working as expected.
VincentHanxiaoDu opened this issue · 3 comments
VincentHanxiaoDu commented
Problem: Eval metric for catboost_spark.CatBoostClassifier is not working when it's set to be "AUC".
catboost version: 1.2.5
Operating System: CentOS Linux release 7.9.2009
CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
GPU: Not installed.
My code is like:
session = (
SparkSession.builder
.appName("test catboost")
.master("yarn")
.config("spark.jars.packages", "ai.catboost:catboost-spark_3.5_2.12:1.2.5")
.enableHiveSupport()
.getOrCreate()
)
import catboost_spark
......
clf = (
catboost_spark.CatBoostClassifier()
.setLabelCol("meta_fpd_15")
.setFeaturesCol("features")
.setDepth(6)
.setRandomSeed(42)
.setEvalMetric("AUC")
.setLearningRate(0.3)
.setIterations(500)
)
model = clf.fit(train_pool, evalDatasets=[eval_pool])
The training log is like this:
0: test: 0.5061911 best: 0.5061911 (0) total: 1.14s remaining: 9m 30s
1: test: 0.5052205 best: 0.5061911 (0) total: 1.89s remaining: 7m 50s
2: test: 0.5022173 best: 0.5061911 (0) total: 2.65s remaining: 7m 19s
3: test: 0.5015299 best: 0.5061911 (0) total: 3.43s remaining: 7m 5s
4: test: 0.5024059 best: 0.5061911 (0) total: 4.24s remaining: 6m 59s
5: test: 0.5021867 best: 0.5061911 (0) total: 4.92s remaining: 6m 45s
6: test: 0.5020990 best: 0.5061911 (0) total: 5.63s remaining: 6m 36s
7: test: 0.5017771 best: 0.5061911 (0) total: 6.22s remaining: 6m 22s
8: test: 0.5021608 best: 0.5061911 (0) total: 6.81s remaining: 6m 11s
9: test: 0.5020003 best: 0.5061911 (0) total: 7.4s remaining: 6m 2s
10: test: 0.5021596 best: 0.5061911 (0) total: 7.95s remaining: 5m 53s
11: test: 0.5025097 best: 0.5061911 (0) total: 8.55s remaining: 5m 47s
12: test: 0.5024379 best: 0.5061911 (0) total: 9.14s remaining: 5m 42s
13: test: 0.5024908 best: 0.5061911 (0) total: 9.8s remaining: 5m 40s
14: test: 0.5026709 best: 0.5061911 (0) total: 10.8s remaining: 5m 50s
15: test: 0.5026764 best: 0.5061911 (0) total: 11.6s remaining: 5m 49s
which indicates that the model is pretty much randomly predicting the result.
After removing .setEvalMetric("AUC")
, the trace is:
0: learn: 0.4242637 test: 0.4246950 best: 0.4246950 (0) total: 3.64s remaining: 30m 16s
1: learn: 0.3237355 test: 0.3241643 best: 0.3241643 (1) total: 4.21s remaining: 17m 27s
2: learn: 0.2840065 test: 0.2846486 best: 0.2846486 (2) total: 4.76s remaining: 13m 9s
3: learn: 0.2659343 test: 0.2665972 best: 0.2665972 (3) total: 5.34s remaining: 11m 1s
4: learn: 0.2534263 test: 0.2538457 best: 0.2538457 (4) total: 5.88s remaining: 9m 42s
5: learn: 0.2473411 test: 0.2478402 best: 0.2478402 (5) total: 6.45s remaining: 8m 51s
6: learn: 0.2438247 test: 0.2444119 best: 0.2444119 (6) total: 7.03s remaining: 8m 14s
7: learn: 0.2399831 test: 0.2407306 best: 0.2407306 (7) total: 7.56s remaining: 7m 45s
8: learn: 0.2351502 test: 0.2360657 best: 0.2360657 (8) total: 8.13s remaining: 7m 23s
9: learn: 0.2332105 test: 0.2341404 best: 0.2341404 (9) total: 8.68s remaining: 7m 5s
10: learn: 0.2318636 test: 0.2327980 best: 0.2327980 (10) total: 9.2s remaining: 6m 49s
11: learn: 0.2299601 test: 0.2309502 best: 0.2309502 (11) total: 9.77s remaining: 6m 37s
12: learn: 0.2286594 test: 0.2297084 best: 0.2297084 (12) total: 10.3s remaining: 6m 26s
13: learn: 0.2279188 test: 0.2289262 best: 0.2289262 (13) total: 10.9s remaining: 6m 17s
14: learn: 0.2266488 test: 0.2277037 best: 0.2277037 (14) total: 11.4s remaining: 6m 9s
15: learn: 0.2258320 test: 0.2269671 best: 0.2269671 (15) total: 12s remaining: 6m 1s