catboost/catboost

CatBoost for Apache Spark AUC eval metric not working as expected.

VincentHanxiaoDu opened this issue · 3 comments

Problem: Eval metric for catboost_spark.CatBoostClassifier is not working when it's set to be "AUC".
catboost version: 1.2.5
Operating System: CentOS Linux release 7.9.2009
CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
GPU: Not installed.

My code is like:

session = (
        SparkSession.builder
        .appName("test catboost")
        .master("yarn")
        .config("spark.jars.packages", "ai.catboost:catboost-spark_3.5_2.12:1.2.5")
        .enableHiveSupport()
        .getOrCreate()
)

import catboost_spark
......

clf = (
        catboost_spark.CatBoostClassifier()
        .setLabelCol("meta_fpd_15")
        .setFeaturesCol("features")
        .setDepth(6)
        .setRandomSeed(42)
        .setEvalMetric("AUC")
        .setLearningRate(0.3)
        .setIterations(500)
)

model = clf.fit(train_pool, evalDatasets=[eval_pool])

The training log is like this:


0:	test: 0.5061911	best: 0.5061911 (0)	total: 1.14s	remaining: 9m 30s
1:	test: 0.5052205	best: 0.5061911 (0)	total: 1.89s	remaining: 7m 50s
2:	test: 0.5022173	best: 0.5061911 (0)	total: 2.65s	remaining: 7m 19s
3:	test: 0.5015299	best: 0.5061911 (0)	total: 3.43s	remaining: 7m 5s
4:	test: 0.5024059	best: 0.5061911 (0)	total: 4.24s	remaining: 6m 59s
5:	test: 0.5021867	best: 0.5061911 (0)	total: 4.92s	remaining: 6m 45s
6:	test: 0.5020990	best: 0.5061911 (0)	total: 5.63s	remaining: 6m 36s
7:	test: 0.5017771	best: 0.5061911 (0)	total: 6.22s	remaining: 6m 22s
8:	test: 0.5021608	best: 0.5061911 (0)	total: 6.81s	remaining: 6m 11s
9:	test: 0.5020003	best: 0.5061911 (0)	total: 7.4s	remaining: 6m 2s
10:	test: 0.5021596	best: 0.5061911 (0)	total: 7.95s	remaining: 5m 53s
11:	test: 0.5025097	best: 0.5061911 (0)	total: 8.55s	remaining: 5m 47s
12:	test: 0.5024379	best: 0.5061911 (0)	total: 9.14s	remaining: 5m 42s
13:	test: 0.5024908	best: 0.5061911 (0)	total: 9.8s	remaining: 5m 40s
14:	test: 0.5026709	best: 0.5061911 (0)	total: 10.8s	remaining: 5m 50s
15:	test: 0.5026764	best: 0.5061911 (0)	total: 11.6s	remaining: 5m 49s

which indicates that the model is pretty much randomly predicting the result.

After removing .setEvalMetric("AUC"), the trace is:


0:	learn: 0.4242637	test: 0.4246950	best: 0.4246950 (0)	total: 3.64s	remaining: 30m 16s
1:	learn: 0.3237355	test: 0.3241643	best: 0.3241643 (1)	total: 4.21s	remaining: 17m 27s
2:	learn: 0.2840065	test: 0.2846486	best: 0.2846486 (2)	total: 4.76s	remaining: 13m 9s
3:	learn: 0.2659343	test: 0.2665972	best: 0.2665972 (3)	total: 5.34s	remaining: 11m 1s
4:	learn: 0.2534263	test: 0.2538457	best: 0.2538457 (4)	total: 5.88s	remaining: 9m 42s
5:	learn: 0.2473411	test: 0.2478402	best: 0.2478402 (5)	total: 6.45s	remaining: 8m 51s
6:	learn: 0.2438247	test: 0.2444119	best: 0.2444119 (6)	total: 7.03s	remaining: 8m 14s
7:	learn: 0.2399831	test: 0.2407306	best: 0.2407306 (7)	total: 7.56s	remaining: 7m 45s
8:	learn: 0.2351502	test: 0.2360657	best: 0.2360657 (8)	total: 8.13s	remaining: 7m 23s
9:	learn: 0.2332105	test: 0.2341404	best: 0.2341404 (9)	total: 8.68s	remaining: 7m 5s
10:	learn: 0.2318636	test: 0.2327980	best: 0.2327980 (10)	total: 9.2s	remaining: 6m 49s
11:	learn: 0.2299601	test: 0.2309502	best: 0.2309502 (11)	total: 9.77s	remaining: 6m 37s
12:	learn: 0.2286594	test: 0.2297084	best: 0.2297084 (12)	total: 10.3s	remaining: 6m 26s
13:	learn: 0.2279188	test: 0.2289262	best: 0.2289262 (13)	total: 10.9s	remaining: 6m 17s
14:	learn: 0.2266488	test: 0.2277037	best: 0.2277037 (14)	total: 11.4s	remaining: 6m 9s
15:	learn: 0.2258320	test: 0.2269671	best: 0.2269671 (15)	total: 12s	remaining: 6m 1s