Process Marketing Job not writing parquet file to S3

Question

Process Marketing Job not writing parquet file to S3

enr1c091 opened this issue 5 years ago · 2 comments

Hi,

I am running this sample and for some reason that I can't figure out why, the process_marketing_data.py isn't writing the output file to S3 and the Count: log in CWL returns 0. Therefore, the Join step fails since it can't infer schema to the parquet file.

Answer 1 · 2019-12-26T10:13:29.000Z

You should upload the sales sample data to
aws-etl-orchestrator-demo-raw-data/sales and marketing sample data to
aws-etl-orchestrator-demo-raw-data/marketing

For example:
aws s3 ls s3://aws-etl-orchestrator-demo-raw-data --region ap-northeast-1 --profile us-east-1 --recursive
2019-12-26 17:39:42 0 marketing/
2019-12-26 17:43:36 151746 marketing/MarketingData_QuickSightSample.csv
2019-12-26 17:42:55 0 sales/
2019-12-26 17:43:51 2002910 sales/SalesPipeline_QuickSightSample.csv

Answer 2 · 2019-12-28T01:37:45.000Z

Like @liangruibupt pointed out. Project readme updated with instructions for copying the datasets.