guardian/riff-raff

ParseError exceptions don't retry

Closed this issue · 5 comments

sihil commented

We've seen a few occasions where the XML returned from a describeAutoScalingGroups API call fails due to a ParseError. We've now got the exception (below) that causes this so we should be able to write some better retry handling. See #409 for further details.

We originally thought that it was an IOException that was causing this, but that's not true - it's an javax.xml.stream.XMLStreamException.

magenta.FailException: Unhandled exception in task SuspendAlarmNotifications Suspending Alarm Notifications - group will no longer scale on any configured alarms
        at magenta.DeployReporter$.magenta$DeployReporter$$failException(logging.scala:128)
        at magenta.DeployReporter$.failException(logging.scala:131)
        at magenta.DeployReporter$.withFailureHandling(logging.scala:106)
        at magenta.DeployReporter$.magenta$DeployReporter$$sendContext(logging.scala:118)
        at magenta.DeployReporter.taskContext(logging.scala:33)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1$$anonfun$apply$2.apply(TasksRunner.scala:36)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1$$anonfun$apply$2.apply(TasksRunner.scala:27)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1.apply(TasksRunner.scala:27)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1.apply(TasksRunner.scala:25)
        at magenta.DeployReporter$$anonfun$magenta$DeployReporter$$sendContext$1.apply(logging.scala:119)
        at magenta.DeployReporter$$anonfun$magenta$DeployReporter$$sendContext$1.apply(logging.scala:118)
        at magenta.DeployReporter$.withFailureHandling(logging.scala:98)
        at magenta.DeployReporter$.magenta$DeployReporter$$sendContext(logging.scala:118)
        at magenta.DeployReporter.infoContext(logging.scala:38)
        at deployment.actors.TasksRunner$$anonfun$receive$1.applyOrElse(TasksRunner.scala:25)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
        at deployment.actors.TasksRunner.aroundReceive(TasksRunner.scala:12)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[2841,5]
        at deployment.actors.TasksRunner.aroundReceive(TasksRunner.scala:12)
        at deployment.actors.TasksRunner.aroundReceive(TasksRunner.scala:12)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[2841,5]
Message: Read timed out). Response Code: 200, Response Text: OK
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:1305)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:908)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:715)
        at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:466)
        at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:427)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:376)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.doInvoke(AmazonAutoScalingClient.java:3422)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.invoke(AmazonAutoScalingClient.java:3392)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.describeAutoScalingGroups(AmazonAutoScalingClient.java:1280)
        at magenta.tasks.ASG$.listAutoScalingGroups$1(AWS.scala:176)
        at magenta.tasks.ASG$.groupForAppAndStage(AWS.scala:184)
        at magenta.tasks.ASGTask$class.execute(ASGTasks.scala:150)
        at magenta.tasks.SuspendAlarmNotifications.execute(ASGTasks.scala:122)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1$$anonfun$apply$2$$anonfun$apply$4.apply(TasksRunner.scala:37)
        at deployment.actors.TasksRunner$$anonfun$receive$1$$anonfun$applyOrElse$1$$anonfun$apply$2$$anonfun$apply$4.apply(TasksRunner.scala:36)
        at magenta.DeployReporter$$anonfun$magenta$DeployReporter$$sendContext$1.apply(logging.scala:119)
        at magenta.DeployReporter$$anonfun$magenta$DeployReporter$$sendContext$1.apply(logging.scala:118)
        at magenta.DeployReporter$.withFailureHandling(logging.scala:98)
        at magenta.DeployReporter$.magenta$DeployReporter$$sendContext(logging.scala:118)
        at magenta.DeployReporter.taskContext(logging.scala:33)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[2841,5]
Message: Read timed out
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591)
        at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:276)
        at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
        at com.amazonaws.services.autoscaling.model.transform.AutoScalingGroupStaxUnmarshaller.unmarshall(AutoScalingGroupStaxUnmarshaller.java:46)
        at com.amazonaws.services.autoscaling.model.transform.DescribeAutoScalingGroupsResultStaxUnmarshaller.unmarshall(DescribeAutoScalingGroupsResultStaxUnmarshaller.java:56)
        at com.amazonaws.services.autoscaling.model.transform.DescribeAutoScalingGroupsResultStaxUnmarshaller.unmarshall(DescribeAutoScalingGroupsResultStaxUnmarshaller.java:33)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:1260)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:908)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:715)
        at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:466)
        at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:427)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:376)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.doInvoke(AmazonAutoScalingClient.java:3422)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.invoke(AmazonAutoScalingClient.java:3392)
        at com.amazonaws.services.autoscaling.AmazonAutoScalingClient.describeAutoScalingGroups(AmazonAutoScalingClient.java:1280)
        at magenta.tasks.ASG$.listAutoScalingGroups$1(AWS.scala:176)
        at magenta.tasks.ASG$.groupForAppAndStage(AWS.scala:184)
        at magenta.tasks.ASGTask$class.execute(ASGTasks.scala:150)
sihil commented

/cc @jfsoul FYI

sihil commented

This is a corresponding issue on the SDK: aws/aws-sdk-java#892

sihil commented

Fixed by #415

sihil commented

Unbelievably the log is not helpful. I've raised #424 to address this.