[Content Improvement] Datasets used in sagemaker_featurestore_fraud_detection_python_sdk.ipynb
kalyanr-agi opened this issue · 1 comments
kalyanr-agi commented
Link to the notebook
sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.ipynb
What aspects of the notebook can be improved?
links to dataset being used
What are your suggestions?
I can't find the dataset being used in the example. A link to them would be great
netsatsawat commented
Hi @kalyanr-agi ,
Thank you for your question, are you referring to this cell in particular?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import io
s3_client = boto3.client("s3", region_name=region)
fraud_detection_bucket_name = f"sagemaker-example-files-prod-{region}"
identity_file_key = (
"datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_identity.csv"
)
transaction_file_key = (
"datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_transactions.csv"
)
identity_data_object = s3_client.get_object(
Bucket=fraud_detection_bucket_name, Key=identity_file_key
)
transaction_data_object = s3_client.get_object(
Bucket=fraud_detection_bucket_name, Key=transaction_file_key
)
identity_data = pd.read_csv(io.BytesIO(identity_data_object["Body"].read()))
transaction_data = pd.read_csv(io.BytesIO(transaction_data_object["Body"].read()))
If yes, the code is actually referring to the data stored in public S3 bucket. Can you try upgrading boto3
and sagemaker
to the latest one? Also are you running this locally or using SageMaker notebook within AWS environment?