logstash-plugins/logstash-input-file

The file will be processed again after logstash restart

cliffordsun opened this issue · 0 comments

  • Version: logstash 6.4.2 , input plugin 4.1.5
  • Operating System: linux
  • Config File (if you have sensitive info, please remove it):
input{
    file {
        path => "/data/log/standard.log*"
        type => "standard-log"
        ignore_older => "8 hours"
        max_open_files => 4095
    }
}
  • Sample Data:
  • Steps to Reproduce:
  1. start logstash, process file /data/log/standard.log
  2. restart logstash or reload logstash.conf
  3. rename /data/log/standard.log to /data/log/standard.log.20190513-100000
  4. the file /data/log/standard.log.20190513-100000 will be process again from beginning instead of the last checked position

my issue relates to Resend the data from the beginning of a file with sincedb setting

My log file is named /data/log/standard.log, it will rotate when it is large enouth. It will be renamed with the current time as the suffix.
i find the SincedbValue has a member variables name path_in_sincedb which is designed for Read mode. In Tail mode, it will never be assigned except reloading the sincedb file. But the function associate of SincedbCollection will compare the name of watched file which is newly discovered with path_in_sincedb. After the file rotate, it is actually different, and logstash will read the file from the beginning.

      if sincedb_value.watched_file.nil?
        # not associated
        if sincedb_value.path_in_sincedb.nil?
          handle_association(sincedb_value, watched_file)
          logger.trace("associate: inode matched but no path in sincedb")
          return true
        end
        if sincedb_value.path_in_sincedb == watched_file.path          **<----- HERE !!!!!!**
          # the path on disk is the same as discovered path
          # and the inode is the same.
          handle_association(sincedb_value, watched_file)
          logger.trace("associate: inode and path matched")
          return true
        end
        # the path on disk is different from discovered unassociated path
        # but they have the same key (inode)
        # treat as a new file, a new value will be added when the file is opened
        sincedb_value.clear_watched_file
        delete(watched_file.sincedb_key)
        logger.trace("associate: matched but allocated to another")
        return true
      end

A file rotated means that a new file with the same name appears and this file must have been renamed, so cleaning up the path_in_sincedb is a right thing to do in function process_rotation_in_progress, and it can avoid my problem.

My pleasure that you can check whether it's reasonable or not.