amazon-archives/aws-sdk-ios-v1

S3 client doesn’t retry when a timeout occurs.

Closed this issue · 5 comments

First things first, I’m not sure if it’s related, but before anything else I call:

[AmazonErrorHandler shouldNotThrowExceptions];

Now when a timeout occurs, I would expect the S3 client to retry the request until the ‘maxRetries’ is reached. However, it does not and this is because -[AmazonAbstractWebServiceClient shouldRetry:exception:] is returning NO.

What’s happening is that the NSError is expected to be a real NSURLErrorDomain one, whereas the one that’s being checked is wrapped in a AWS ClientErrorDomain, as can be seen in the console in this screenshot:

screen shot 2013-06-27 at 11 56 42 am

After backtracking a bit more, I noticed that this timeout isn’t actually from the NSURLConnection layer, but from -[AmazonServiceResponse timeout] here.

As I understand it atm, there are a couple of issues with the way this timeout is handled:

  1. A AmazonClientException is instantiated instead of a AmazonServiceException, which leads to -[AmazonErrorHandler errorFromException:] to return an error with the AWSiOSSDKClientErrorDomain domain. In this process no real metadata about the error is being communicated, except for a sparse message.
  2. Then the delegate method is called on -[S3PutObjectOperation_Internal request:didFailWithError:] here, which creates a new exception instance, but this time of the appropriate type AmazonServiceException, but again only with the original sparse error message.

This all leads to there not being any real metadata on the error domain and code by the time it reaches -[AmazonAbstractWebServiceClient shouldRetry:exception:] and so the request is never retried.

For now, I’m going to try to make the -[AmazonServiceResponse timeout] method generate a NSError with the NSURLErrorDomain domain and the kCFURLErrorTimedOut error code.

I’ve applied this naive patch which fixes the bug.

Does it need work before it can be turned into a PR?

When a request times out, the client should not retry the request, and it is the expected behavior. Please refer to this blog post for more details.

@yosukematsuda Please correct me if I’m wrong, but that article states:

If a request fails, it may be desirable to retry the request depending on the class of error that occurred. The AWS SDK for iOS will automatically retry on timeouts and certain service exceptions. The specific retry logic can always been seen in the source at our GitHub repo. This retry logic is used only if the request was executed synchronously.

Now I am in fact using async methods, but I’m doing so using the S3TransferManager, about which the article states:

A special feature of the S3TransferManager is that even when operating in asynchronous mode, file and part uploads will be retried.

Also, the code is there, so I’m not sure why you say this is expected behavior.

There are two kinds of timeouts: timeout and connectionTimeout. timeout is a hard limit on the total time the SDK will wait for a request to complete. So, regardless of maxRetries and connectionTimeout values, the SDK tries to cancel the request when the timeout occurs. Your modification basically makes timeout look like connectionTimeout so that the SDK will retry when the timeout occurs.

S3TransferManager will retry for connectionTimeout, but since it's a hard limit, it doesn't retry for timeout. What you are thinking as timeout is probably connectionTimeout, so you may want to set a high value for timeout, and configure connectionTimeout to an appropriate value in order to meet your use case.

I see, that makes total sense then. Thanks for the explanation!