Error in Join Marketing and Sales Data
johnsontroye1 opened this issue · 7 comments
I have pulled down this repo and have it working until the last step (Join Marketing and Sales Data). I have tried to get past this unsuccessfully. Here's the error logged in Gluerunner CloudWatch logs:
[ERROR] 2018-07-18T15:17:26.792Z 88fb4fc4-8a9d-11e8-bec7-f7119107e998 Glue job "JoinMarketingAndSalesData" run with Run Id "jr_bebcc..." failed. Last state: FAILED. Error message: AnalysisException: u'Path does not exist: hdfs://ip-172-31-74-135.ec2.internal:8020/user/root/aa.etl-output-path/tmp/sales;'
Yes, i essentially cleaned out everything several times and reran to the same point of error. The only difference i see in the logs is different run id and ip address to the ec2. Can you please tell me where I go to open a support case for this? Thank you.
There were 5 .json files in the repo that needed config changes.
- cloudformation/gluerunner-lambda-params.json
- lambda/s3-deployment-descriptor.json
- cloudformation/glue-resources-params.json
- lambda/gluerunner/gluerunner-config.json
- cloudformation/step-functions-resources-params.json
Would you mind sending me your .json files so i can compare against what i have. Maybe i did mess up a configuration.
Thank you very much,
Troy
It could be the reason that a wrong parameter set in glue-resources-params.json
:
{
"ParameterKey": "S3ETLOutputPath",
"ParameterValue": "<NO-DEFAULT>"
}
Please make sure ParameterValue is indeed set to a S3 path, like:
s3://<bucket_name>/output
Not simply:
output
Because the later will actually write the result to HDFS local system! That's why the Join Marketing and Sales Data
couldn't find the file.
Config parameters and docs were updated to simplify the configuration process and make it less error prone.