nyctaxi-deploy-prd fails
ehsanmok opened this issue · 6 comments
The pipeline fails to create the prod stack in SagemakerMonitoringSchedule
because of
Resource handler returned message: "Error occurred during operation 'CREATE'." (RequestToken: 40af8897-76d2-abb5-6efc-ef8c6948d42b, HandlerErrorCode: GeneralServiceException)
Note that everything else is successful and the works in us-east-1
Hi @ehsanmok, are you using the latest code in master. The deploy role requires permissions to create monitoring schedule. The specific errors are not visible from CFN.
Yes, it's the latest CFT from the one-click launch button. The error is too generic and I can't find more details about it as well.
Hi @ehsanmok the CFN stack in s3 was out of date with the repository pipeline.yml. It has now been updated, but you can fix your stack by updating it with the pipeline.yml
in the master branch.
This will update the DeployRole
with the permissions sufficient to create the monitoring schedule.
Just updated with the master but still failed with the same error.
Hi @ehsanmok please ensure you updated the main nyctaxi
stack, this will update the DeployRole
which is used by the nyctaxi-deploy-prd
stack. I've re-tested this from scratch and validate the the pipeline works, so perhaps start again with a clean CFN setup to re-test if still having issues.
Yes, updated the main CFT and released the changes.
First initial attempt to delete the main stack gave this error:
mlops-nyctaxi-deploy-role is invalid or cannot be assumed
though second attempt worked but had to delete all the artifacts, s3 bucket, endpoint, model etc. manually (can be automated with lambda and crhelper
package). After recreating the entire stack again and running the mlops notebook, the pipeline fails to create nyctaxi-workflow
with
Resource handler returned message: "State Machine is being deleted: 'arn:aws:states:us-east-1:ACCOUNT:stateMachine:nyctaxi' (Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineDeleting; Request ID: 218c294f-53a2-44ba-9256-4cb227b43fa9; Proxy: null)" (RequestToken: 66428fdb-9fb6-3309-5ed8-04e7d868dbd1, HandlerErrorCode: GeneralServiceException)
For the third time, deleted everything and recreated the stack. Now the prod is successful! Thanks for the very useful design :)