logstash-plugins/logstash-input-s3

[Logstash 7.16.2] S3 input plugin replaces the region in endpnt url

Opened this issue · 7 comments

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version)
    7.16.2
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker)
    docker
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
    kubernetes
  4. How was the Logstash Plugin installed
    Shipped with logstash 7.16.2

Description of the problem including expected versus actual behavior:

We have set up an interface endpoint for our S3 service and access S3 bucket via that interface endpoint. when S3 input plugin is configured to use that interface endpoint we get an error saying Name or service unknown

Here is our configuration:

input {
                s3 {
                    bucket => <our_bucket>
                    type => ...
                    sincedb_path =>...
                    prefix => ...
                    region => "us-east-1"
                    endpoint =>  "https://<our_vpc_endpoint_id>.s3.us-east-1.vpce.amazonaws.com"
                }

Here is the error we get

[ERROR] 2022-01-25 07:31:54.612 [[main]<s3] javapipeline - A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::S3 bucket=>"<our_bucket>", endpoint=>"https://<<our_vpc_endpoint_id>.s3.us-east-1.vpce.amazonaws.com", prefix=>...., id=>...., type=>"elb", sincedb_path=>...., region=>"us-east-1", enable_metric=>true, codec=>"plain_82c47ed3-633f-4f89-b7ef-6a154796b950", enable_metric=>true, charset=>"UTF-8">, role_session_name=>"logstash", delete=>false, interval=>60, watch_for_new_files=>true, temporary_directory=>"/tmp/logstash", include_object_properties=>false, gzip_pattern=>".gz(ip)?$">
  Error: Failed to open TCP connection to <our_bucket>.<our_vpc_endpoint_id>.s3.vpce.amazonaws.com:443 (initialize: name or service not known)
  Exception: Seahorse::Client::NetworkingError
  Stack: uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:943:in `block in connect'
org/jruby/ext/timeout/Timeout.java:114:in `timeout'
org/jruby/ext/timeout/Timeout.java:90:in `timeout'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/n

From this error message

  Error: Failed to open TCP connection to <our_bucket>.<our_vpc_endpoint_id>.s3.vpce.amazonaws.com:443 (initialize: name or service not known)

Its clear that region is being replaced from the actual endpoint url (actual should be <our_bucket>.<our_vpc_endpoint_id>.s3.us-east-1.vpce.amazonaws.com with the region)

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.

1.Set up a interface endpoint for S3 -> https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html
2.Use the interface endpoint as endpoint in S3 plugin
3.Deploy the logstash

Provide logs (if relevant):

I can reproduce the same error, however, logstash-input-s3 does not modify the endpoint setting, see here and here. It passes to aws SDK straightly. The plugin use aws SDK to access aws service, so this could be an issue of the SDK or the SDK require more setup to make it work with vpc.
Need further investigation.

@glen-uc Looks like you are encountering aws/aws-sdk-ruby#2483

Can you try setting the environment variable AWS_S3_US_EAST_1_REGIONAL_ENDPOINT to 'regional' and report back?

[Edit]
The environment variable has other issues, try setting

s3_us_east_1_regional_endpoint=regional

in your aws config file

@robbavey Thank you for your reply.

we tried adding s3_us_east_1_regional_endpoint=regional to the config file and deployed the logstash with interface endpoint, but we got this error

 hostname "<bucket_name>.<vpc_id>.s3.us-east-1.vpce.amazonaws.com" does not match the server certificate
  Exception: Seahorse::Client::NetworkingError

Looks like it's not replacing the region in the endpoint as expected, in order to solve this error we tried passing ssl_verify_peer as false in additional setting and redeployed the logstash but it again failed but this time with this error

{:exception=>Aws::S3::Errors::NoSuchBucket, :message=>"The specified bucket does not exist"

We verified that specified bucket exists and logstash has necessary permissions (it works when we use default endpoint)

@glen-uc, we're seeing the same issue with the s3 output plugin (logstash-output-s3 v4.3.5), our workaround is to set the endpoint value with 2 region strings, e.g.:

s3 {
   region => "us-east-1"
   endpoint =>  "https://<our_vpc_endpoint_id>.s3.us-east-1.us-east-1.vpce.amazonaws.com"
}

this "tricks" the plugin's logic to replace the first us-east-1 string, but keeping the second us-east-1 string, so that the final value contains the aws region.

@jacqclouseau Thank you for the suggestion

I tried your approach by adding one more region to the endpoint URL i.e endpoint => "https://<our_vpc_endpoint_id>.s3.us-east-1.us-east-1.vpce.amazonaws.com"

But still, I am getting

{:exception=>Aws::S3::Errors::NoSuchBucket, :message=>"The specified bucket does not exist"

Here is my full logstash s3 input configuration

                s3 {
                    bucket => <my_bucket>
                    type => <my_type>
                    sincedb_path => <my_path>
                    prefix => <my_prefix>
                    endpoint => "<vpc_ep_id>.s3.us-east-1.us-east-1.vpce.amazonaws.com"
                    region => "us-east-1"
                    additional_settings => {
                             ssl_verify_peer => false
                    }
                  }

Note: If i remove the custom endpoint logstash works again so not an issue with the bucket being missing

@glen-uc, we've encountered the name or service not known and hostname does not match the server certificate errors only.

We've seen the name resolution error message when the 'us-east-1' string was removed from the target's address.

The certificate validation error was seen when we set the endpoint value to endpoint => "https://<vpc endpoint>.s3.us-east-1.us-east-1.vpce.amazonaws.com". After having a look at the certificate properties we had to change the value to endpoint => "https://bucket.<vpc endpoint>.s3.us-east-1.us-east-1.vpce.amazonaws.com". I don't know if this applies to our setup only, or if that's how the AWS S3 certificates for the VPC interfaces get generally created.

We could reproduce the cert issue without Logstash by running curl -v https://<bucket name>.<vpc endpoint>.s3.us-east-1.vpce.amazonaws.com, what gave us a SSL_ERROR_BAD_CERT_DOMAIN response. We then inspected the certificate by running the following command:

fqdn='<bucket name>.<vpc interface>.s3.us-east-1.vpce.amazonaws.com'
echo | openssl s_client -showcerts -servername "${fqdn}" -connect "${fqdn}":443 2>/dev/null | openssl x509 -inform pem -noout -text

Looking at the X509v3 Subject Alternative Name values told us what names the certificate would recognise.

Running curl -v https://<bucket name>.bucket.<vpc endpoint>.s3.us-east-1.vpce.amazonaws.com was then just the confirmation we needed.

Apologies if I went off topic with the cert issue description.

@jacqclouseau Thank you for the detailed description due to which we were able to solve the problem with logstash

Here is what happened in our case

We were running logstash in a K8 cluster along with other logging components like fluentbit, when we migrated to using interface endpoints for s3 we first did changes to fluent bit so that it uses the interface endpoint by setting the endpoint to something like this https://<vpc_ep_id>.s3.us-east-1.vpce.amazonaws.com and it worked fine without any additional configuration ( hence we assumed bucket.<vpc_ep_id> not required while setting up endpoint URL )

When doing the same for logstash we encountered errors as described above but finally we are able to solve it by setting endpoint URL to something like this https://bucket.<vpc_ep_id>.s3.us-east-1.us-east-1.vpce.amazonaws.com

Note: not using bucket.<vpc_ep_id>... might also be the reason why setting s3_us_east_1_regional_endpoint=regional did not work, will try by setting this and using only one region in endpoint URL something like https://bucket.<vpc_ep_id>.s3.us-east-1.vpce.amazonaws.com to see if it works