logstash-plugins/logstash-input-file

sincedb_clean_after is set to wrong duration

noosc opened this issue · 13 comments

noosc commented

@sincedb_expiry_duration = self.class.days_to_seconds(@opts.fetch(:sincedb_clean_after, 14))

I found the param sincedb_clean_after is not the time as described.
When I config like sincedb_clean_after => 0.0001, actually records are cleaned after 8.64 days;
When I config like sincedb_clean_after => "0.0001s", actually records are cleaned after 8.64 seconds;

It must be multiplied 86400 once more somewhere, please fix this bug.

woby commented

I'm facing the same issue.
Thank you, noosc. Your workaround has worked well.

Well, I found a suspicious code.

config :sincedb_clean_after, :validate => [FriendlyDurations, "days"], :default => "14 days" # days

According to the code here, sincedb_clean_after has a validator, FriendlyDurations. And FriendlyDurations says “coerce into seconds.”
def self.call(value, unit = "sec")
# coerce into seconds

Therefore, though I'm not sure because I don't know how Logstash handles here, @sincedb_clean_after in Logstash::Inputs::File probably contains the value converted into seconds.
This value is passed to ObservingTail etc. as it is. Since ObservingTail assumes that the unit of sincedb_clean_after is a day, a value multiplied by 86400 too much probably appears.

We had similar issue, but not sure is there 86400 factor.

Our version is:
logstash-input-file (4.1.6)

Everday I do zip on files with find mtime -3 parameter, so inode was released max after 4 days, despite of that using:
sincedb_clean_after => "0.1" #should be ~2,5h
sincedb_clean_after => "20 min"

was not working completely. Permanently getting:

[2019-04-16T00:57:59,850][WARN ][filewatch.tailmode.processor] >>> Rotation In Progress - inode change detected and original content is not fully read, file is closed and path points to new content {"watched_file details"=>"<FileWatch::WatchedFile: @filename='FILE_NAME', @State='rotation_in_progress', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='0', current_size='7184', last_stat_size='7184', file_open?='false', @initial=false, @sincedb_key='11010064 0 64771'>"}
[2019-04-16T01:08:21,471][WARN ][filewatch.tailmode.processor] >>> Rotation In Progress - inode change detected and original content is not fully read, file is closed and path points to new content {"watched_file details"=>"<FileWatch::WatchedFile: @filename='FILE_NAME', @State='rotation_in_progress', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='0', current_size='7704', last_stat_size='7704', file_open?='false', @initial=false, @sincedb_key='11010066 0 64771'>"}

and of course logstash started processing new file after number of bytes stored in sincedb. Line in sincedb for this inode was kept the same.

Solved issue when set:
sincedb_clean_after => "0.00012"

But I'm not sure how it's processing this value "0.00012" - it supposed to be ~12min, but multiplied by 86400 would get ~200 days, so much longer than my 3 days. For sure, checked, that now it's working somehow properly, I can see in sincedb, that old line with old inode information is being deleted from sincedb, and new entry as last line is being added.

This would explain why it's not working for me at all on my 6.7 RPM install - my 12 hours is probably being turned into something much larger.

@Zeal0us @krecik82 @woby @noosc

sincedb_clean_after is a relatively new setting. As the docs indicate it can be specified as a numeric or a string value - if a numeric value is specified then it is interpreted as in DAYS, if it is a string setting then you can use the "Friendly Durations" formats for convenience, flexibility and better self documenting configs.

I chose (after some consultation) to represent days because the sincedb entries need to be present for long enough to prevent content re-reads but short enough to limit potential inode recycling and this durations is most likely to be days.

noosc commented

@guyboertje

We are talking about different thing, I think.
If sincedb_clean_after => "1s" is set, actually the file index will be cleaned after 1 day
If sincedb_clean_after => 1 is set, actually the file index will be cleaned after 86400 day

Though, I 've just tested sincedb_clean_after => "0.0001s", and that duration is about 5-10s.
I 'm not sure about factor 86400, only supposition.

@noosc
OK. Thanks for clarifying - I see the bug now.
self.class.days_to_seconds should only be applied to the raw numerical values and not values supplied via the Friendly Durations.

I'm not sure when I'll get some time to patch this though.

I think that the bug is a double calculation.

  • the ObservingBase.initialize invokes @settings = Settings.from_options(options)
  • the piece of code explain the error:
    def self.from_options(opts)
    new.add_options(opts)
    end
    def self.days_to_seconds(days)
    (24 * 3600) * days.to_f
    end
    def initialize
    defaults = {
    :delimiter => "\n",
    :file_chunk_size => FILE_READ_SIZE,
    :max_open_files => 4095,
    :file_chunk_count => MAX_ITERATIONS,
    :sincedb_clean_after => 14,
    :exclude => [],
    :stat_interval => 1,
    :discover_interval => 5,
    :file_sort_by => "last_modified",
    :file_sort_direction => "asc",
    }
    @opts = {}
    @lastwarn_max_files = 0
    add_options(defaults)
    end
    def add_options(opts)

    from_options class's method invoke the initialize which invokes the add_options (here happens the first multiply) and then on the instance created invokes again add_options (which is the second multiply)

My previous comment is not correct. The bug raises in the commit 28c0534 that introduced FriendlyDurations module that already does the conversion to seconds while not removed the previous method self.days_to_seconds that converted days to seconds, so the first conversion is done by FriendlyDurations and on that result is also applied the old days_to_seconds

@noosc I've just published logstash-input-file 4.1.15 please could you verify if it resolves your issue?
Many thanks

noosc commented

@noosc I've just published logstash-input-file 4.1.15 please could you verify if it resolves your issue?
Many thanks

I downloaded the release package, and installed as follow

gem build logstash-input-file.gemspec
xxx/bin/logstash-plugin install --local logstash-input-file-4.1.15.gem

but raise an exception when running logstash, like that:

Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"Java::JavaLang::IllegalStateException", :message=>"Unable to configure plugins: (PluginLoadingError) Couldn't find any input plugin named 'file'. Are you sure this is correct? Trying to load the file input plugin resulted in this error: no such file to load -- /Users/cc/Downloads/logstash-7.5.2/vendor/local_gems/7eebc83f/logstash-input-file-4.1.15/lib/jars/filewatch-1.0.1", :backtrace=>["org.logstash.config.ir.CompiledPipeline.<init>(CompiledPipeline.java:119)", "org.logstash.execution.JavaBasePipelineExt.initialize(JavaBasePipelineExt.java:60)"

my logstash version is 7.5.2, is there any other way to install 4.1.15

It's weird because the gem is deployed in rubygems, could you try to use that instead of build locally (shouldn't change anything, but give it a try). However do you have the full output of the command bin/logstash-plugin install --local ?
You could look at your local Logstash's Gemfile.lock an Gemfile searching for it, or instead of install try the update:

bin/logstash-plugin update logstash-input-file

Hi @noosc did you manage to install/update the plugin?

noosc commented

install/update is always timeout for my network, i will have a try when 4.1.15 installed in new logstash version