databricks/spark-deep-learning

sparkdl.xgboost getting stuck trying to map partitions

timpiperseek opened this issue · 0 comments

I am running the following code to try to fit a model

from sparkdl.xgboost import XgboostClassifier
param = {
    'num_workers': 4, # number of workers on the cluster, adjust as needed
  'missing': 0,
    "objective": "binary:logistic",
    "eval_metric": "logloss",
      'featuresCol':"features", 
      'labelCol':"objective",
      'nthread':32 # equal to the number of cpus on each worker machine
}
  
train, test = data.randomSplit([0.001, 0.001])
xgb_classifier = XgboostClassifier(**param)
xgb_clf_model = xgb_classifier.fit(train)

When I run the model training on my databricks cluster is seems to be getting stuck when it is trying to map partitions.
It is using almost zero cpu on each cluster but the memory usage is slowly increasing.

image

is there anything I can do to get around this issue