lbroudoux/es-amazon-s3-river

What ports need to be open for es-amazon-s3-river plugin to connect to s3 endpoint

Closed this issue · 1 comments

I have two queries:

  1. What are the ports that need to be open for the plugin to talk to the S3 end point. THe reason I ask this is because in my setup I see that I'm able to connect and index the docs in S3 when I initially activate the plugin (via the curl cmd) . However, subsequently, if there are new files in the S3 bucket, the plugin is not able to get an update. I see the following continuously apprearing in the logs:

[2014-04-17 19:03:14,102][INFO ][org.apache.http.impl.client.DefaultHttpClient] I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond
[2014-04-17 19:03:14,102][INFO ][org.apache.http.impl.client.DefaultHttpClient] Retrying request

I checked with netstat and found that the connection to s3 happens over a range of ports (each time I checked it is a diff port #); probably the plugin is trying diff port #'s when it fails to connect.
I don't have all ports open (I don't want to have all ports open! ) which is probably why I don't see updates/ new files in S3 not getting picked up by the plugin.
So, what ports need to open in order to get the plugin to be fully functional ?

  1. If question # 1 is valid - i.e there are a set of ports that the plugin requires be open, then is this port range configurable? I do not see any such options in the curl cmd currently.

Hi,

Actually, I'm not able to say exactly which ports are really used by the plugin ... The plugin is using Amazon S3 SDK and more specifically the com.amazonaws.services.s3.AmazonS3Client. Though it does not require any specific ports to be opened.

That said, from what you can check in source, client only seems to use a basic wrapped Http Client. So I'm thinking that target host port should only be 80 or 443.

In order to investigate, would you give :

  • the version of plugin you use,
  • the version of Elasticsearch you use,
  • the scanning update rate you configure.

I've made my tests with 0.0.2, ES 0.90.0, an update_rate set to 36000 and everything works well : new docs are picked as they're added to S3.