logstash-plugins/logstash-input-s3

Unable to read S3 file that has + charatcter in its name

Opened this issue · 1 comments

One of my team faces the following issue. Has anyone come across a similar issue before?

Airflow Log files to Cleversafe S3 Buckets with default configuration as provided in Airflow ( {dag_id}/{task_id}/{execution_date}/{try_number}.log).

Above dag_id, task_id, execution_date, try_number...All of these are dynamic and will keep on changing in real time.
So essentially logs path come out something like below:
s3://XXXX/airflow/logs/XXX_YYY//2020-04-26T16:01:00+00:00/1.log

When trying to read the log via Logstash S3 connector, it can not read this location as execution_date has + sign and it replaces this + sign with Space. Hence logstash can not read these log files and does not find the location.

Any ideas / solutions to overcome this issue is appreciated. Thanks.

kares commented

Hey, this seems very weird I've just tried reading files from a bucket with special characters both in prefix and file name:

input {
  s3 {

    prefix => "nested/sailor/logs/A_TEST/2020-04-21T:12:22:22+00:00/" 
    # contains file: +test2+
    # s3://kares-test1/nested/sailor/logs/A_TEST/2020-04-21T:12:22:22+00:00/+test2+

    aws_credentials_file => "../aws_credentials.yml"
    bucket => "kares-test1"

    type => "s3"
    interval => 10
    additional_settings => { force_path_style => true }
  }
}

... content from +test2+ file was read and printed.

I've tried with AWS S3 so I believe this is a compatibility issue with the IBM "Cleversafe S3 Buckets" product.