[Content Improvement] Datasets used in sagemaker_featurestore_fraud_detection_python_sdk.ipynb

Question

[Content Improvement] Datasets used in sagemaker_featurestore_fraud_detection_python_sdk.ipynb

kalyanr-agi opened this issue 2 years ago · 1 comments

kalyanr-agi commented 2 years ago

Link to the notebook
sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.ipynb

What aspects of the notebook can be improved?
links to dataset being used

What are your suggestions?
I can't find the dataset being used in the example. A link to them would be great

Answer 1 · 2023-06-29T17:36:21.000Z

Hi @kalyanr-agi ,

Thank you for your question, are you referring to this cell in particular?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import io

s3_client = boto3.client("s3", region_name=region)

fraud_detection_bucket_name = f"sagemaker-example-files-prod-{region}"
identity_file_key = (
    "datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_identity.csv"
)
transaction_file_key = (
    "datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_transactions.csv"
)

identity_data_object = s3_client.get_object(
    Bucket=fraud_detection_bucket_name, Key=identity_file_key
)
transaction_data_object = s3_client.get_object(
    Bucket=fraud_detection_bucket_name, Key=transaction_file_key
)

identity_data = pd.read_csv(io.BytesIO(identity_data_object["Body"].read()))
transaction_data = pd.read_csv(io.BytesIO(transaction_data_object["Body"].read()))

If yes, the code is actually referring to the data stored in public S3 bucket. Can you try upgrading boto3 and sagemaker to the latest one? Also are you running this locally or using SageMaker notebook within AWS environment?