aws/aws-sdk-cpp

S3 crt client timeouts and retry strategy are not working.

Manjunathagopi opened this issue · 15 comments

Describe the bug

Below mentioned s3 crt client configs we set.

Aws::S3Crt::S3CrtClient *s3_crt_client;
Aws::S3Crt::Model::GetObjectRequest object_request;
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::S3Crt::ClientConfiguration config;
config.throughputTargetGbps = 1;
config.partSize = 10*1024*1024;
config.httpRequestTimeoutMs = 100;
config.connectTimeoutMs = 100;
config.requestTimeoutMs = 100;  
s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config); 

We are reading 25MB at a time and we are measuring how much time it is taking for getObject.
average of 500ms taken for each getObject
Even though we set all the timeouts s3 crt client is taking more time?
We even tried with no retry strategy, but still there is no use and the behaviour is same.

  config.httpRequestTimeoutMs = 100;
  config.connectTimeoutMs = 100;
  config.requestTimeoutMs = 100;  
  std::shared_ptr<Aws::Client::RetryStrategy> retry;
  retry.reset(Aws::New<Aws::Client::DefaultRetryStrategy>("test", 0, 25));
  config.retryStrategy = retry;

GetObject() is not exiting even timeout is reached. One time we saw GetObject() for size 25MB took 10seconds.
Basically behavior is same even if we didn't set timeouts or assign custom retry strategy.
we are running this in a c5a4x large instance which is running in ap-south-1 region and s3 bucket is also in the same region and account.
Please tell whats wrong with this and suggest how to reduce time taken for GetObject().

Expected Behavior

With zero retries GetObject should exit once timeout is reached.

Current Behavior

With zero retries GetObject is not exiting even timeout is reached.

Reproduction Steps

zero retries

#include <fcntl.h>
#include <unistd.h>


#include <aws/core/Aws.h>
#include <aws/core/client/DefaultRetryStrategy.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/HeadObjectRequest.h>
#include <aws/s3-crt/model/GetObjectRequest.h>
#include <aws/s3-crt/S3CrtErrors.h>

void usage(const char *bin)
{
    printf("USAGE: %s <s3_uri> <part_size(in bytes)> <out_file(optional)>\n", bin);
}

uint64_t get_curr_time_ms(void)
{
    struct  timespec   ts;
    uint64_t           ts_64;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    ts_64 = (uint64_t)((ts.tv_sec * 1000) + (ts.tv_nsec/1000000));
    return ts_64;
}

int main (int argc, char* argv[])
{
    char *s3_uri;
    uint64_t read_size = UINT64_MAX;
    uint64_t part_size;
    char *buf = NULL;
    uint64_t read_left;
    uint64_t to_read;
    uint64_t ret;
    uint64_t iter = 0;
    char *out_file = NULL;
    int out_fd = -1;

    if(argc < 5 || argc > 6)
    {
        usage(argv[0]);
        return 0;
    }

    s3_uri = argv[1];
    part_size = strtoull(argv[2], NULL, 10);
    if(argc == 6) out_file = argv[3];
    Aws::S3Crt::S3CrtClient *s3_crt_client;
    Aws::S3Crt::Model::GetObjectRequest object_request;
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration config;
    config.httpRequestTimeoutMs = 100;
    config.connectTimeoutMs = 100;
    config.requestTimeoutMs = 100;  
    std::shared_ptr<Aws::Client::RetryStrategy> retry;
    retry.reset(Aws::New<Aws::Client::DefaultRetryStrategy>("test", 0, 25));
    config.retryStrategy = retry;
    config.throughputTargetGbps = 1;
    config.partSize = 10*1024*1024;
    s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

    char bucket[128] = {0};
    char key[128] = {0};
    if(sscanf(s3_uri, "s3://%[^/]/%s", bucket, key) == 2)
    {
        Aws::S3Crt::Model::HeadObjectRequest head_object_request;
        Aws::S3Crt::Model::HeadObjectOutcome outcome;
        head_object_request.SetBucket(bucket);
        head_object_request.SetKey(key);
        outcome = s3_crt_client->HeadObject(head_object_request);
        if(outcome.IsSuccess())
            printf("Object found, size:%lu, etag %s\n", outcome.GetResult().GetContentLength(),  outcome.GetResult().GetETag().c_str());
        object_request.SetBucket(bucket);
        object_request.SetKey(key);
    }

    if(out_file)
    {
        out_fd = open(out_file, O_CREAT | O_WRONLY);
        if(out_fd <= 2)
        {
            printf("Error in opening %s in write mode", out_file);
            return 0;
        }
    }
    int64_t pos = 0;

    buf = new char[part_size];
    read_left = read_size;
    while(read_left != 0)
    {
        ret = 0;
        int64_t retry_count = -1;
        to_read = part_size <= read_left ? part_size : read_left;
        iter ++;
        uint64_t start,end;
        Aws::S3Crt::Model::GetObjectOutcome outcome;
        object_request.SetRange(std::string("bytes=") + std::to_string(pos) + "-" + std::to_string(pos+to_read-1));
        object_request.SetResponseStreamFactory(
                [buf, to_read]()
                {
                std::unique_ptr<Aws::StringStream>
                stream(Aws::New<Aws::StringStream>("test"));
                stream->rdbuf()->pubsetbuf(static_cast<char*>(buf),
                        to_read);

                return stream.release();
                });
        start = get_curr_time_ms();
        outcome = s3_crt_client->GetObject(object_request);
        end = get_curr_time_ms();
        printf("Time taken for GetObject %lums\n",end-start);
        if(outcome.IsSuccess())
        {
            ret = outcome.GetResult().GetContentLength();

            if(ret <= to_read)
            {
                pos += ret;
                //log_info("Testing data %02x %02x", ((uint8_t*)buf)[0], ((uint8_t*)buf)[1]);
            }
            else
            {
                printf("This should never happen, pos_:%lu, requested:%lu, read:%lu\n",
                        pos, to_read, ret);
                break;
            }
        }
        else
        {
            printf("Failed to read, pos_:%lu, requested:%lu, error:%s, error_type:%d, error_code:%d\n",
                    pos, to_read, outcome.GetError().GetMessage().c_str(), outcome.GetError().GetErrorType(), outcome.GetError().GetResponseCode()) ;
            break;
        }
        if(ret == 0)
        {
            printf("Nothing read, read_left:%lubytes, iter:%lu\n", read_left, iter);
            break;
        }
        else if(ret < to_read)
        {
            printf("Read %ldbytes instead of %ldbytes, iter:%lu\n", ret, to_read, iter);
        }
        else
        {
            printf("Read %ld bytes, iter:%lu\n", ret, iter);
        }
        read_left -= ret;

        if(ret > 0 && out_fd > 2)
        {
            if(write(out_fd, buf, ret) != ret)
            {
                printf("Error in write!!! (%lu bytes)\n", ret);
                return 0;
            }
            fdatasync(out_fd);
        }
    }

    if(s3_crt_client)
    {
        Aws::Delete(s3_crt_client);
        s3_crt_client = NULL;
    }
    Aws::ShutdownAPI(options);

    if(buf) delete[] buf;
    if(out_fd > 2) close(out_fd);
    return 0;
}

default strategy

#include <fcntl.h>
#include <unistd.h>


#include <aws/core/Aws.h>
#include <aws/core/client/DefaultRetryStrategy.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/HeadObjectRequest.h>
#include <aws/s3-crt/model/GetObjectRequest.h>
#include <aws/s3-crt/S3CrtErrors.h>

void usage(const char *bin)
{
    printf("USAGE: %s <s3_uri> <offset in bytes> <read size(0->till end)> <part_size(in bytes)> <out_file(optional)>\n", bin);
}

uint64_t get_curr_time_ms(void)
{
    struct  timespec   ts;
    uint64_t           ts_64;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    ts_64 = (uint64_t)((ts.tv_sec * 1000) + (ts.tv_nsec/1000000));
    return ts_64;
}

int main (int argc, char* argv[])
{
    char *s3_uri;
    uint64_t offset;
    uint64_t read_size;
    uint64_t part_size;
    char *buf = NULL;
    uint64_t read_left;
    uint64_t to_read;
    uint64_t ret;
    uint64_t iter = 0;
    char *out_file = NULL;
    int out_fd = -1;

    if(argc < 5 || argc > 6)
    {
        usage(argv[0]);
        return 0;
    }

    s3_uri = argv[1];
    offset = strtoull(argv[2], NULL, 10);
    read_size = strtoull(argv[3], NULL, 10);
    if(read_size == 0) read_size = UINT64_MAX;
    part_size = strtoull(argv[4], NULL, 10);
    if(argc == 6) out_file = argv[5];
    Aws::S3Crt::S3CrtClient *s3_crt_client;
    Aws::S3Crt::Model::GetObjectRequest object_request;
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration config;
    config.httpRequestTimeoutMs = 100;
    config.connectTimeoutMs = 100;
    config.requestTimeoutMs = 100;  
    config.throughputTargetGbps = 1;
    config.partSize = 10*1024*1024;
    s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

    char bucket[128] = {0};
    char key[128] = {0};
    if(sscanf(s3_uri, "s3://%[^/]/%s", bucket, key) == 2)
    {
        Aws::S3Crt::Model::HeadObjectRequest head_object_request;
        Aws::S3Crt::Model::HeadObjectOutcome outcome;
        head_object_request.SetBucket(bucket);
        head_object_request.SetKey(key);
        outcome = s3_crt_client->HeadObject(head_object_request);
        if(outcome.IsSuccess())
            printf("Object found, size:%lu, etag %s\n", outcome.GetResult().GetContentLength(),  outcome.GetResult().GetETag().c_str());
        object_request.SetBucket(bucket);
        object_request.SetKey(key);
    }

    if(out_file)
    {
        out_fd = open(out_file, O_CREAT | O_WRONLY);
        if(out_fd <= 2)
        {
            printf("Error in opening %s in write mode", out_file);
            return 0;
        }
    }
    int64_t pos = 0;

    buf = new char[part_size];
    read_left = read_size;
    while(read_left != 0)
    {
        ret = 0;
        int64_t retry_count = -1;
        to_read = part_size <= read_left ? part_size : read_left;
        iter ++;
        uint64_t start,end;
        Aws::S3Crt::Model::GetObjectOutcome outcome;
        object_request.SetRange(std::string("bytes=") + std::to_string(pos) + "-" + std::to_string(pos+to_read-1));
        object_request.SetResponseStreamFactory(
                [buf, to_read]()
                {
                std::unique_ptr<Aws::StringStream>
                stream(Aws::New<Aws::StringStream>("test"));
                stream->rdbuf()->pubsetbuf(static_cast<char*>(buf),
                        to_read);

                return stream.release();
                });
        start = get_curr_time_ms();
        outcome = s3_crt_client->GetObject(object_request);
        end = get_curr_time_ms();
        printf("Time taken for GetObject %lums\n",end-start);
        if(outcome.IsSuccess())
        {
            ret = outcome.GetResult().GetContentLength();

            if(ret <= to_read)
            {
                pos += ret;
                //log_info("Testing data %02x %02x", ((uint8_t*)buf)[0], ((uint8_t*)buf)[1]);
            }
            else
            {
                printf("This should never happen, pos_:%lu, requested:%lu, read:%lu\n",
                        pos, to_read, ret);
                break;
            }
        }
        else
        {
            printf("Failed to read, pos_:%lu, requested:%lu, error:%s, error_type:%d, error_code:%d\n",
                    pos, to_read, outcome.GetError().GetMessage().c_str(), outcome.GetError().GetErrorType(), outcome.GetError().GetResponseCode()) ;
            break;
        }
        if(ret == 0)
        {
            printf("Nothing read, read_left:%lubytes, iter:%lu\n", read_left, iter);
            break;
        }
        else if(ret < to_read)
        {
            printf("Read %ldbytes instead of %ldbytes, iter:%lu\n", ret, to_read, iter);
        }
        else
        {
            printf("Read %ld bytes, iter:%lu\n", ret, iter);
        }
        read_left -= ret;

        if(ret > 0 && out_fd > 2)
        {
            if(write(out_fd, buf, ret) != ret)
            {
                printf("Error in write!!! (%lu bytes)\n", ret);
                return 0;
            }
            fdatasync(out_fd);
        }
    }

    if(s3_crt_client)
    {
        Aws::Delete(s3_crt_client);
        s3_crt_client = NULL;
    }
    Aws::ShutdownAPI(options);

    if(buf) delete[] buf;
    if(out_fd > 2) close(out_fd);
    return 0;
}

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.269

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

CRT retries and timeouts work differently from regular CPP SDK config option and CRT S3 client currently does not honor them. We have a backlog feature request here to improve the situation #2594.

As an alternative you can configure lowSpeedLimit to indicate to CRT to kill requests that are too slow.

Hi @DmitriyMusatkin, i tried setting lowSpeedLimit to 75MBps which will kill the requests that are below 75MBps and tried running the test.
Still some requests were taking around 1second to read 25MB(bitrate of around 27MBps) from s3.

Aws::S3Crt::ClientConfiguration config;
config.lowSpeedLimit = 78643200; 

Hi,
Any update on the above query?

the way low speed limit is configured in sdk is to kill connection if throughput dips under the specified number for a given number of intervals (3 by default and otherwise derived from request timeout) - https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp#L330.

Hard to tell exactly, whats going on and we might need trace level logs to debug it further. Note: CRT already breaks up get requests into part requests based on the configured part size and your code does the same thing, so there might be something weird interaction between the 2 going on.

Hi @DmitriyMusatkin attaching the trace logs for this can you please check this?

Uploading aws_sdk_2024-06-04-11.log…

bad link? it just points to this issue

The log you show is less than 2 secs.

[DEBUG] 2024-06-04 11:36:31.624 task-scheduler [140710120072960] id=0x7ff98c003238: Scheduling gather_statistics task for future execution at time 1311437023221890
[DEBUG] 2024-06-04 11:36:31.965 task-scheduler [140709998868224] id=0x7ff988003d38: Scheduling gather_statistics task for future execution at time 1311437363810723
[DEBUG] 2024-06-04 11:36:31.966 task-scheduler [140709998868224] id=0x7ff988026158: Scheduling gather_statistics task for future execution at time 1311437364379715
...
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140710120072960] id=0x7ff98c003238: Running gather_statistics task with <Canceled> status
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140709998868224] id=0x7ff988003d38: Running gather_statistics task with <Canceled> status
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140709998868224] id=0x7ff988026158: Running gather_statistics task with <Canceled> status

You can see the task to monitor the throughput all canceled within 1 sec, which they are scheduled to run after 1 sec. referring to here

The reason for those task to be canceled is because the request has completed.

[TRACE] 2024-06-04 11:36:32.275 S3MetaRequest [140710030337792] id=0x1837460 Meta request clean up finished.

As @DmitriyMusatkin said, SDK will only kill the connection unless it keeps being slow for a certain time.

Hi @TingDaoK @DmitriyMusatkin,
Below attached the aws trace logs for s3 read of 25 MB 8 times(total of 200MB). Every time read is taking around 1second for each 25MB read.

config.throughputTargetGbps = 1.0;
config.partSize = 10485760;
config.httpRequestTimeoutMs = 100;
config.connectTimeoutMs = 100;
config.requestTimeoutMs = 100;  
config.lowSpeedLimit = 78643200;

https://getshared.com/SvtM9D7w

Can you please explain why it is taking more time than the configured lowspeedlimt and targetthroughput and when will SDK kill the connection and what is the exact threshold to consider it to be slow?

Hi,
Any update on the above query?

We reviewed the logs.
A couple things are coming into play here

  • cpp sdk configures low speed limit interval to be 3s from logic here https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp#L330
  • crt is destroying the connection pool when there are no active requests a given endpoint. That means all connections in that pool are released along with associated tracking. in your sample you are firing requests to a bucket sequentially one after the other, so what ends up happening is pool with a bunch of connections is created for a request, request uses that pool, and once request completes, pool is destroyed. Next request results in pool being recreated. This behavior has changed recently (sometime in april 2024 on crt side and cpp sdk probably imported that change couple weeks after)

what seems to happen in your case is:

  • you kick off request for 25 MB
  • internally it splits it into 10, 10, 5 MB requests and puts them on separate connections
  • each of those smaller requests complete under 1s (and some of them potentially under lowSpeedLimit for the duration of 1s, but because it only run for 1s, it never triggered the connection kill)
  • once the overall request completes, there is no more requests for a given bucket and it kills the connection pool

Hi @DmitriyMusatkin thanks for the detailed analysis.
I have few questions regarding this.

  1. Is the division of 25MB to 10,10,5MB is based on part size that is configured which is 10MB?
  2. To reproduce the requests going below lowSpeedLimit case, I purposely set the part size to 500MB, the lowSpeedLimit to 150MB, and the read size to 500MB. As expected, we encountered an error: "Http connection channel shut down due to failure to meet throughput minimum (aws-c-http: AWS_ERROR_HTTP_CHANNEL_THROUGHPUT_FAILURE)." However, the issue is that the operation does not exit immediately after 3 seconds as anticipated. Instead, the getObject operation exits after 30 seconds. This delay suggests that some form of retry mechanism might be occurring internally. How can we improve this?
 config.throughputTargetGbps = 1.0;
 config.partSize = 524288000;
 config.lowSpeedLimit = 157286400;

Attaching the trace logs download link for issue 2 LINK can you please check this?

Hi,
Any update on the above query?

  1. s3 crt client does automatic splitting of all puts and gets into part sized chunks (based on the part size configured on the client) and runs all of them in parallel to optimize throughput
  2. lowSpeedLimit works similar curl low speed limit in this case, it kills the connection because its really slow and tries to complete the request over a different different connection. (note: requestTimeoutMs for regular cpp s3 client maps to curls CURLOPT_LOW_SPEED_TIME as well). httpRequestTimeoutMs works by setting overall curl timeout for regular s3 client, but unfortunately as i mentioned earlier is not supported by s3 crt client yet.
    By default CRT will configure a number of retries for every request and i guess what you are observing are those retries as connection get killed (should be 5 retries by default with some backoff time between each). You could try to change the retries to make it fail faster, but thats probably not the best practice. Outside of that, im not sure there are other alternatives unfortunatelly, until crt starts supporting request timeouts. Using regular s3 cpp client might be an option to explore as well.