S3 Output Plugin takes 45 seconds per "bucket =>" to start
dorth999 opened this issue · 6 comments
Posted to https://discuss.elastic.co/t/s3-input-plugin-with-many-s3-buckets-takes-30-minutes-to-start/184685
Also incorrectly posted this issue to the logstash-input-s3 instead of the logstash-output-s3 plugin, so starting it here and hopefully the issue can be closed for the logstash-input-s3.
I'm using Logstash 7.1.1 and logstash-output-s3 4.1.9. I have 38 different output locations (S3 buckets) depending on the logic. Logstash is taking nearly 30 minutes to start.
[2019-06-07T01:21:21,711][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:50:55,383][INFO ][logstash.javapipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...
When I replace all the S3 bucket locations with the file output plugin, it takes about 2 minutes.
[2019-06-07T01:13:56,530][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:15:42,680][INFO ][logstash.javapipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...
I know the s3 plugin is validating that all these buckets actually exist and are writable before startup, but this seems excessively slow. I'm running Logstash in AWS on t2.mediums (2 core / 4GB). Once Logstash is up and running, these servers keep up without breaking a sweat.
If my solution scales to additional buckets and logic, I fear the startup time will be a huge issue when considering autoscaling in addition to being a pain during deployments.
Here's my s3 output configuration. I have 38 different sections and buckets.
s3 {
region => "us-east-1"
bucket => "xxxxxx-prod"
prefix => "%{+YYYY}/%{+MM}/%{+dd}"
server_side_encryption => true
server_side_encryption_algorithm => "AES256"
time_file => 5
codec => "json"
canned_acl => "bucket-owner-full-control"
}
I have started reducing the complexity of my configuration to isolate the issue. The problem appears to be the same even if you use a single S3 bucket but with 28 output sections. So if you're trying to reproduce, a single bucket is fine.
I have enabled S3 object logging to see what calls are going to S3 and how much time they are taking. I have changed my configuration to use 1 bucket and 10 output sections. Upon starting, I can see that a test object is ATTEMPTING to be created, but that request is timing out with the following error:
Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
I then see a PutObject and a DeleteObject that occur within 1 second of each other - all using the same "programmatic-access-test-object". Here is what it looks like:
June 21st 2019, 12:03:02.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:02:40 +0000
June 21st 2019, 12:03:23.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:02:40 +0000
June 21st 2019, 12:03:23.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:02:40 +0000
June 21st 2019, 12:03:44.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:03:23 +0000
June 21st 2019, 12:04:04.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:03:23 +0000
June 21st 2019, 12:04:04.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:03:23 +0000
June 21st 2019, 12:04:25.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:04:05 +0000
June 21st 2019, 12:04:45.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:04:05 +0000
June 21st 2019, 12:04:46.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:04:05 +0000
June 21st 2019, 12:05:06.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:04:46 +0000
June 21st 2019, 12:05:27.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:04:46 +0000
June 21st 2019, 12:05:27.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:04:46 +0000
June 21st 2019, 12:05:47.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:05:27 +0000
June 21st 2019, 12:06:08.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:05:27 +0000
June 21st 2019, 12:06:08.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:05:27 +0000
June 21st 2019, 12:06:28.000 PutObject RequestTimeout logstash-programmatic-access-test-object-2019-06-21 19:06:08 +0000
June 21st 2019, 12:06:49.000 PutObject logstash-programmatic-access-test-object-2019-06-21 19:06:08 +0000
June 21st 2019, 12:06:49.000 DeleteObject logstash-programmatic-access-test-object-2019-06-21 19:06:08 +0000
So for each output location the S3 Output Plugin takes about 45 seconds. If you have 10 outputs, then your going to take about 450 seconds - about 7 1/2 minutes. If as in my production case you have 28 buckets, that's 1260 seconds or 21 minutes JUST TO START.
I see that there is an issue which was opened over 3 years ago (#75) noting the timeout error, but that is still open.
Once the Logstash is up and running, then things chug along just fine, but startup time is a BIG scalability issue.
I haven't broken into the code to see if I can perhaps disable the validation of S3 buckets upon startup (possibly a dangerous thing) or if there is a way to debug what is timing out. Is this project being actively maintained?
hi, I am facing the very same issue.
any comments on how did you manage to solve this problem would be greatly appreciated.
Try to disable metrics and validation validate_credentials_on_root_bucket=false
. That should help you a lot. More info here
Enabling the validate_credentials_on_root_bucket=false
did indeed resolve my problem. My startup time dropped from about 28 minutes down to about 90 seconds.
Thank you!