Add option to disable file truncation detection (patch attached)

Question

Add option to disable file truncation detection (patch attached)

maciejfranek opened this issue 3 years ago · 5 comments

Hi, recently I'm struggling with duplicated log issue, below you'll description of my problem and potential solution. I'm happy to hear your feedback and opinion whether it is considerable feature to be integrated with your code.

Duplicated logs problem

Our platform uses Azure Files to store logs (SMB3 file share).
Due to network latency, file size attribute is not always consistent with the actual file size - following scenario may happen:

Assume service1.log file with 10000 bytes written to it, and logstash has processed all 10000 bytes. Suppose next log entry is incoming:

service1 writes log entry:
{"timestamp":"1600689852496","level":"INFO","thread":"main","logger":"org.entrypoints.App$","message":"Execution completed in [354ms]"}
135 chars. For the sake of simplicity, let's assume that ech char is one byte.
logstash reads the file, and is able to get log entry made by service1 in point 1.
After processing message, it updates in memory position of last processed
byte: 10000 + 135 = 10135.
logstash checks the file size attribute, but it receives 10000 (due to network latency) The value before the service appended log entry.
At this point logstash recognizes file truncation as file size attribute is less than last processed byte position calculated in point 4.
As a result, logstash reads file from the beginning.

Potential mitigation

Use option to disable file truncation detection, but there must be following condition met:
Rolling policy must be designed in such a way, that completed log file is abandoned (no file truncation is performed) and subsequent log entries are written to new file.

option_to_disable_file_truncation_detection.patch (attached with .log ext, as github does not allow .patch ext
option_to_disable_file_truncation_detection.patch.log
) contains changes that allow to disable file truncation detection.
Patch is built from v4.2.4 tag.

Example usage:

	file {
		path => "/usr/share/logstash/logs/*.json"
		exclude => [
			"*.gz"
		]
		codec => "json"
        detect_shrunk => false
	}

Thanks.

Answer 1 · 2021-03-25T18:32:15.000Z

logstash reads the file, and is able to get log entry made by service1 in point 1.

Why would it read that entry? Reading additional data from the file is driven by stat returning a file size larger than the file size currently read.

Answer 2 · 2021-03-29T06:02:45.000Z

I might not understand the code correct then.
I wonder what could cause such messages in logstash:

[2021-03-22T13:48:50,977][ERROR][filewatch.tailmode.processor][main][e2926275b17505373c71a9dcbe1361d462c5f63902d07b8992a43e0e63d20851] process_active file shrunk: new size is 169100, old size 196731 {:path=>"2021-03-22.service1-75677666c7-8hf8t.logjson.0"}

(I should have provided it in previous post). I always write to the file, never remove content from it (like appenders do). Having such logstash log message, is much surprise for me.

Answer 3 · 2021-03-29T06:24:41.000Z

Thanks for the report, not sure we want the file input designed for every flaw another system has.
I do not mind having an option but than why have it for "shrunk" only, do we eventually end up with multiple here?

logstash checks the file size attribute, but it receives 10000 (due to network latency)

It's very bad that the size would not give us an accurate result (on a SMB share) despite reading more data than reported size? This is very strange. There might be other things that would go wrong if the content and size are not aligned.

Maybe a high stat_interval (1s default) would help seeing less of these.
Otherwise we might need more input on understanding what's going on
(e.g. trace logging from the file input with one input file).

Answer 4 · 2021-03-31T21:24:51.000Z

Thanks for your response.
Struggling with reproducing the issue with only one file. I'll provide trace info once I manage to get it.

Answer 5 · 2021-12-17T12:50:30.000Z

Cannot reproduce... Closing