logstash-plugins/logstash-input-s3

Plugin dont process objects correctly, dont delete or backup

Opened this issue · 11 comments

  1. Logstash -oss version 7.16-8.1
  2. Docker
  3. K8s - Openshift 4.7
  4. Included in image

Hello. Trouble in S3 input plugin with private S3 like AWS Minio.
Logstash normally read object and send to output, but backup or delete is not working.
Object staying in source bucket with no changes, objects are small json access log files, average size is 1-2 kB.

Input config:
   input {
      s3 {
        access_key_id => "${S3_ACCESS_KEY}"
        secret_access_key => "${S3_SECRET_KEY}"
        endpoint => {{ $.Values.s3_connect_endpoint | quote }}
        bucket => "test-bucket"
        prefix => "prefix"
        backup_to_bucket => "backup-bucket"
        backup_add_prefix => "processed"
        delete => true
      }
    }

IAM role is allowed to any actions, checked that by delete object with mcli tool.
In S3 access logs i see only success (200) GET and HEAD, and no one PUT, POST or DELETE.
In logstash log i see only success logs like below

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827669,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-02-312 is updated at 2022-03-05 06:20:02 +0000 and will process in the next cycle"}}

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827800,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-02-396 is updated at 2022-03-05 06:20:02 +0000 and will process in the next cycle"}}

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827932,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-03-185 is updated at 2022-03-05 06:20:03 +0000 and will process in the next cycle"}}
33

Found some interesting code
https://github.com/logstash-plugins/logstash-input-s3/blob/main/lib/logstash/inputs/s3.rb#L383

As i understand - plugin compare last_modified of object and log, and according to my log - postpone object processing to next cycle, and after default 60 seconds it repeating again.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.
Objects from bucket downloaded in /tmp/logstash/ and staying there.

Same problem here, using logstash 8.2.0 docker image. Switched to fork...

Same here, switched to fork.

@kaisecheng any ideas why it happens?

The reason for comparing the last modified time of object and log is to confirm the object is not updated since the list action. If the object gets updated, its last modified time will bring it to the next cycle. Deleting the comparison leads to duplication/ reprocessing of ingested data.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.

The plugin can't work properly without sincedb. Maybe the Logstash user lack of permission to write in the path?
Enabling debug log should give some hints

The reason for comparing the last modified time of object and log is to confirm the object is not updated since the list action. If the object gets updated, its last modified time will bring it to the next cycle. Deleting the comparison leads to duplication/ reprocessing of ingested data.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.

The plugin can't work properly without sincedb. Maybe the Logstash user lack of permission to write in the path? Enabling debug log should give some hints

we use minio s3 bucket with admin s3:* permissions. Logstash reads logs good, but repeats reading them all the time

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

should I write smth to sincedb_path? and how to check if Logstash is able to write to sincedb_path?

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

I tried to run simultaneously two pipelines: one using aws s3 bucket, another one - minio s3 bucket. In both cases I found no errors in debug mode.

There was written that both pipelines have default sincedb file created, BUT there was only one existed at the mentioned path - for aws bucket.

It’s not local filesystem permissions, not minio permissions (because we use admin credentials). There is a lack of logs to understand why it happened.

Please advice how to debug and fix it.

@kaisecheng

@lysenkojito
The permission I refer to is the user running Logstash should have enough privilege to write on disk in sincedb_path. Taking docker environment as an example, the default user is logstash.

  1. Make sure logstash user can read and write the path sincedb_path
  2. Make sure each s3-input has unique sincedb_path (this setting must be a filename path and not just a directory)

BUT there was only one existed at the mentioned path - for aws bucket.

Are you setting the same sincedb path in both pipelines? If paths are unique, I would expect to see error in log for minio s3. The best path forward for you is to create a new issue including a reproducer with debug log, config and pipelines for further investigation if you believe it is a bug. We support AWS s3 officially. The help for minio s3 will be limited.

@lysenkojito The permission I refer to is the user running Logstash should have enough privilege to write on disk in sincedb_path. Taking docker environment as an example, the default user is logstash.

  1. Make sure logstash user can read and write the path sincedb_path
  2. Make sure each s3-input has unique sincedb_path (this setting must be a filename path and not just a directory)

BUT there was only one existed at the mentioned path - for aws bucket.

Are you setting the same sincedb path in both pipelines? If paths are unique, I would expect to see error in log for minio s3. The best path forward for you is to create a new issue including a reproducer with debug log, config and pipelines for further investigation if you believe it is a bug. We support AWS s3 officially. The help for minio s3 will be limited.

@kaisecheng
Sincedb paths were set by default. They had different names, but one folder -…/s3/

It’s definitely not permissions issue.
okay, I’ll create an issue. Thank you