logstash-plugins/logstash-input-s3

Files with same last modified timestamp miss processing

Opened this issue · 1 comments

Logstash information

  1. Logstash version,8.2.3
  2. Logstash installation source, logstash-8.2.3-linux-x86_64.tar.gz
  3. How is Logstash being run, supervisor
  4. How was the Logstash Plugin installed, default plugin with Logstash
  5. input conf
  input {
        s3 {
                type => "something"
                sincedb_path => "/data/elk/logstash-8.2.3/since_db_s3_something"
                temporary_directory => "/tmp/logstash/shippersomething"
                bucket => "s3-bucket"
                prefix => "logserver/something"
                interval => 120
                region => "us-west-2"
                codec => "json"
                access_key_id => "*********************"
                secret_access_key => "****************************"
        }
}

JVM version, 1.8.0_232
OS version,CentOS Linux release 7.6.1810

We upload our log file to S3 every minute on the public network, then Logstash pull them from S3 and output to ES on the subnet.

When the public network environment is not good,some files upload to S3 may fail and lead to re-upload. When the network environment back to normal, several files may have the same last modified timestamp. When Logstash process those files with same last modified timestamp, some files were missed.

This problem occurs frequently. In my case, files upload to S3 every minute, 3~10 files missed every day.

See also 191.