logstash-plugins/logstash-input-file

JSON parse errors, suspicions pointing at file{} not reading lines properly

Opened this issue · 4 comments

Hi. I recently switched from reading files via filebeat, to file{} in Logstash 6.5.4, on CentOS 7.5.1804. But after the switch I've started to experience JSON parse errors. This was not a issue when the file was read with filebeat. The reason for using filebeat, was due to the inode reusage issue that has been present in Logstash. But it still appears to be a issue in my Logstash installation, despite use of sincedb_clean_after.

My filebeat config:

  - input_type: log
    paths:
      - /var/log/inputs/sample_*.log
    exclude_files: [".gz$"]
    ignore_older: 300m
    close_inactive: 1m
    clean_inactive: 301m
  # Logstash output, left out.

My file config:

file {
    id => "input-sample"
    path => "/var/log/inputs/sample_*.log"
    mode => "tail"
    ignore_older => "2 hours"
    sincedb_clean_after => "2.2 hours"
    sincedb_path => "/var/sincedb/sample"
    start_position => "beginning"
    close_older => "30 minutes"
    codec => "json"
  }

Note: The log lines are line terminated (\n), but json codec is used because file{} removes the delimiter.

Files are stored in the following format, and is not subject to logrotate.
sample_2019-01-10-134001.log

Files are created every 15 minutes.

Error example:
[2019-01-10T12:15:11,524][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unrecognized token 'ecipientAddress': was expecting ('true', 'false' or 'null')

The errors change, in the sense that JSON keys are cutoff at different locations.
I've verified that the JSON input looks fine.

I have multiple files that are not JSON, that are read through file{} aswell. These should in theory produce _grokparsefailures, if inode reusage is still a problem. But i have yet to see any _grokparsefailures,due to logs not being read from the beginning for those.

I've tried to use codec => "plain", and use filter{json{source=>...} but that still resulted in the JSON errors above.

Currently wondering if the errors are due to Logstash reading files quicker than they are written. The files are written by a python script, that dumps json events and appends json events with \n. I do not know the internals of Logstash good enough, to know how it handles such situations.

Example:
Logstash reads sample.log, and sees the following line
"info1", "info2", "info3", "info4"

But 5 seconds later, the raw content is:
"info1", "info2", "info3", "info4", "info5"\n
"info6", "info7", "info8", "info9", "info10"\n

Would this confuse Logstash, where it should resume reading?

I am also facing the exact same issue. I am using logstash version 6.3.2 and my config is below:

input {
  file {
    path => ["/usr/local/tomcat/tomcat-i01/logs/grant.log","/usr/local/tomcat/tomcat-i01/logs/grant.log.*"]
    exclude => ["*.gz"]
    sincedb_path => "/usr/local/tomcat/logstash/sincedb/grant.sincedb"
    stat_interval => 2
    codec => multiline {
        pattern => "^\{"
        negate => true
        what => previous
    }
  }
}

I was previously using json codec with start_position set as beginning. At that time, I thought that this error might be happening because the json event in the log file is spanning multiple lines in some cases. So, to handle that kind of scenario, changed the codec to multiline and removed the start_position and added stat_interval to allow time for the rotation to happen properly, so that logstash can read the rotated file correctly.

The input log files are rotated at every hour. What I have observed that, the last event which is logged in the grant.log file, just before the rotation, and the second event in the newly rotated file, get merged because of which the json structure breaks and throws parsing error.

The first event logged in the newly rotated file seems to be dropped by logstash.