sllynn/spark-xgboost

py4j.protocol.Py4JJavaError: An error occurred while calling o839.fit.

Opened this issue · 4 comments

Hi

When Test With XGBoost 1.5-SNAPSHOT on Spark3.0 get an error blow, while XGBoost-1.4.2 works.

XGBoost compiled with aws-s3 enable get xgboost4j_2.12-1.5.0-SNAPSHOT.jar

test_sdf:  None
train_sdf.schema:  StructType(List(StructField(age,IntegerType,true),StructField(workclass,StringType,true),StructField(fnlwgt,DoubleType,true),StructField(education,StringType,true),StructField(education-num,DoubleType,true),StructField(marital-status,StringType,true),StructField(occupation,StringType,true),StructField(relationship,StringType,true),StructField(race,StringType,true),StructField(sex,StringType,true),StructField(capital-gain,DoubleType,true),StructField(capital-loss,DoubleType,true),StructField(hours-per-week,DoubleType,true),StructField(native-country,StringType,true),StructField(label,StringType,true)))
train_sdf.schema.fields:  [StructField(age,IntegerType,true), StructField(workclass,StringType,true), StructField(fnlwgt,DoubleType,true), StructField(education,StringType,true), StructField(education-num,DoubleType,true), StructField(marital-status,StringType,true), StructField(occupation,StringType,true), StructField(relationship,StringType,true), StructField(race,StringType,true), StructField(sex,StringType,true), StructField(capital-gain,DoubleType,true), StructField(capital-loss,DoubleType,true), StructField(hours-per-week,DoubleType,true), StructField(native-country,StringType,true), StructField(label,StringType,true)]
string_columns:  ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country', 'label']
string_column_map:  [('workclass', 'workclass_ix'), ('education', 'education_ix'), ('marital-status', 'marital-status_ix'), ('occupation', 'occupation_ix'), ('relationship', 'relationship_ix'), ('race', 'race_ix'), ('sex', 'sex_ix'), ('native-country', 'native-country_ix'), ('label', 'label_ix')]
target:  label_ix
not_string_cols:  ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
predictors:  ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'workclass_ix', 'education_ix', 'marital-status_ix', 'occupation_ix', 'relationship_ix', 'race_ix', 'sex_ix', 'native-country_ix']
Tracker started, with env={}
Traceback (most recent call last):
  File "spark-xgboost_adultdataset_no_mlflow.py", line 118, in <module>
    model = cv.fit(train_sdf_prepared)
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/base.py", line 129, in fit
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/tuning.py", line 352, in _fit
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/python-env/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/python-env/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/tuning.py", line 352, in <lambda>
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/tuning.py", line 52, in singleTask
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/base.py", line 62, in __next__
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/base.py", line 103, in fitSingleModel
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/base.py", line 127, in fit
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 321, in _fit
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 318, in _fit_java
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/pyspark.zip/pyspark/sql/utils.py", line 131, in deco
  File "/mnt/yarn/local/usercache/offline/appcache/application_1619577760121_30585/container_e05_1619577760121_30585_01_000001/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o839.fit.
: ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:750)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:624)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:199)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
	at org.apache.spark.ml.Predictor.fit(Predictor.scala:150)
	at org.apache.spark.ml.Predictor.fit(Predictor.scala:114)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Can you try with XGBoost 1.0? That was the last version I tested this against.

Can you try with XGBoost 1.0? That was the last version I tested this against.
As I mentioned on main post, XGBoost 1.4.2 works with same compile option.

@sllynn it'll be really useful if you could tell atleast one such combination of
xgboost4j x xgboost4j-spark x spark x python x {any other things like envs/settings etc}, that have your NB atleast working

curious how did you resolve it