[Bug Report] Xgboost in Sagemaker gives wildly different (and incorrect) result from my local model
pfan-well opened this issue · 0 comments
Describe the bug
I trained a binary classification model with Sagemaker's container for xgboost 1.7-1.
I also have previously developed a xgboost model for the same dataset locally.
The positive rate for the dataset is generally < 4%, very low occurrence.
When I compared the predicted probabilities from the sagemaker builtin model and my local model, the results are opposite.
Given the low positive rate I believe the sagemaker model outputs are incorrect.
See images.
I checked the inputs for training and they are identical, except that on my local machine I fed the model csv file whereas for sagemaker xgboost it required the data in libsvm format. But after double checking the training data were the same. I also fed the same hyperparameter.
To reproduce
For sagemaker:
from sagemaker.xgboost.estimator import XGBoost
# version 1:
xgb_script_mode_estimator = XGBoost(
entry_point=script_path,
framework_version="1.7-1",
# hyperparameters=hyperparameters,
role=role,
instance_count=2,
instance_type=instance_type,
output_path=output_path,
code_location=output_path
)
# calling fit
# version 2:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, "xgboost", "1.7-1")
xgb = sagemaker.estimator.Estimator(
container,
role,
instance_count=1,
instance_type="ml.m4.xlarge",
output_path="s3://{}/{}/output".format(s3_bucket, key, "no-show-xgb"),
sagemaker_session=sess,
)
# calling fit
Is there any way to debug this issue?