aws-samples/aws-etl-orchestrator

Process Marketing Job not writing parquet file to S3

enr1c091 opened this issue · 2 comments

Hi,

I am running this sample and for some reason that I can't figure out why, the process_marketing_data.py isn't writing the output file to S3 and the Count: log in CWL returns 0. Therefore, the Join step fails since it can't infer schema to the parquet file.

You should upload the sales sample data to
aws-etl-orchestrator-demo-raw-data/sales and marketing sample data to
aws-etl-orchestrator-demo-raw-data/marketing

For example:
aws s3 ls s3://aws-etl-orchestrator-demo-raw-data --region ap-northeast-1 --profile us-east-1 --recursive
2019-12-26 17:39:42 0 marketing/
2019-12-26 17:43:36 151746 marketing/MarketingData_QuickSightSample.csv
2019-12-26 17:42:55 0 sales/
2019-12-26 17:43:51 2002910 sales/SalesPipeline_QuickSightSample.csv

Like @liangruibupt pointed out. Project readme updated with instructions for copying the datasets.