vanilla/fluent-plugin-burrow

nested message field in message field gets lost when overlaying parsed message

warmfusion opened this issue · 4 comments

Given a json structured rsyslog event which has a field named message which includes a jsonformatted json string from an application.
And fluent configuration such that the message field from rsyslog is parsed as json and overlayed over the original event.
When an event is emitted by an app which includes message as part of its log,
Then the burrow plugin parses the json, overlays it onto the main record but removes the `message field (which was parsed out of the message.

Rsyslog Configuration

# A lovely template
template(name="json-photon"
  type="list") {
    constant(value="{")
      constant(value="\"@timestamp\":\"")       property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"type\":\"syslog")
      constant(value="\",\"tag\":\"")           property(name="syslogtag" format="json")
      constant(value="\",\"relayhost\":\"")     property(name="fromhost")
      constant(value="\",\"relayip\":\"")       property(name="fromhost-ip")
      constant(value="\",\"logsource\":\"")     property(name="source")
      constant(value="\",\"hostname\":\"")      property(name="hostname" caseconversion="lower")
      constant(value="\",\"program\":\"")       property(name="programname")
      constant(value="\",\"priority\":\"")      property(name="pri")
      constant(value="\",\"severity\":\"")      property(name="syslogseverity")
      constant(value="\",\"facility\":\"")      property(name="syslogfacility")
      constant(value="\",\"severity_label\":\"")   property(name="syslogseverity-text")
      constant(value="\",\"facility_label\":\"")   property(name="syslogfacility-text")
      constant(value="\",\"message\":\"")       property(name="msg" format="json")
    constant(value="\"}\n")
}

# This line sends all lines to defined IP address at port 10034,
# using the "json-photon" format template

*.*                         @fluent:10034;json-photon

FluentD Configuration

<system>
  log_level trace
</system>

<source>
    port 10034
    bind 0.0.0.0
    tag syslog
    format json
    @type udp
    @id syslog-source
</source>


<filter>
     @type stdout
     @id debug-filter
</filter>


<match syslog>
  @type rewrite_tag_filter
  @id syslog-matcher
  <rule>
    key tag
    pattern /^(\w+)/
    tag syslog.$1
  </rule>
</match>

# Attempt to filter all events as if they have json
# and when they do, expand the object and
# overlay to the base of this record
# then reemit without the `syslog` prefix for processing
# like normal things

<match syslog.**>
    @type burrow
    key_name message
    format json
    remove_prefix syslog
    action overlay
</match>

<match **>
     @type stdout
     @id debug-matcher
</match>

Python Test Code

from logging.handlers import SysLogHandler
import logging

logger = logging.getLogger()

#add handler to the logger
handler = logging.handlers.SysLogHandler()


#add formatter to the handler
formatter = logging.Formatter('python: { "loggerName":"%(name)s", "timestamp":"%(asctime)s", "pathName":"%(pathname)s", "logRecordCreationTime":"%(created)f", "functionName":"%(funcName)s", "levelNo":"%(levelno)s", "lineNo":"%(lineno)d", "time":"%(msecs)d", "levelName":"%(levelname)s", "message":"%(message)s"}')

handler.formatter = formatter
logger.addHandler(handler)



#logger.info("Test INFO Message")
#logging.warn("Hello WARNING message")
#logging.error("Hello ERROR message")

logging.critical("Hello CRITICAL message")

Results

Fluent logs show (Formatted json for readability);

fluent_1   | 2019-11-23 12:24:02.203101000 +0000 syslog.python: 
{
  "@timestamp": "2019-11-23T12:24:02.202400+00:00",
  "type": "syslog",
  "tag": "python:",
  "relayhost": "172.18.0.1",
  "relayip": "172.18.0.1",
  "logsource": "172.18.0.1",
  "hostname": "172.18.0.1",
  "program": "python",
  "priority": "10",
  "severity": "2",
  "facility": "1",
  "severity_label": "crit",
  "facility_label": "user",
  "message": " { \"loggerName\":\"root\", \"timestamp\":\"2019-11-23 12:24:02,191\", \"pathName\":\"send_syslog_message.py\", \"logRecordCreationTime\":\"1574511842.191565\", \"functionName\":\"<module>\", \"levelNo\":\"50\", \"lineNo\":\"22\", \"time\":\"191\", \"levelName\":\"CRITICAL\", \"message\":\"Hello CRITICAL message\"}"
}

which includes the message from rsyslog, and the json formatted string.

Once it parses the event we then get

fluent_1   | 1970-01-01 00:03:11.000000000 +0000 python: 
{
  "@timestamp": "2019-11-23T12:24:02.202400+00:00",
  "type": "syslog",
  "tag": "python:",
  "relayhost": "172.18.0.1",
  "relayip": "172.18.0.1",
  "logsource": "172.18.0.1",
  "hostname": "172.18.0.1",
  "program": "python",
  "priority": "10",
  "severity": "2",
  "facility": "1",
  "severity_label": "crit",
  "facility_label": "user",
  "loggerName": "root",
  "timestamp": "2019-11-23 12:24:02,191",
  "pathName": "send_syslog_message.py",
  "logRecordCreationTime": "1574511842.191565",
  "functionName": "<module>",
  "levelNo": "50",
  "lineNo": "22",
  "levelName": "CRITICAL"
}

Which does not include the message key anymore.

Expected

The message key from the origin event is replaced with the message key parsed from the json-string the origin record contained.

Hello,

It looks like you want to set keep_key to true in your burrow configuration. You can read about that in the readme here https://github.com/vanilla/fluent-plugin-burrow/blob/master/README.md

action can be one of 4 values: inplace, overlay, replace, and prefix. When using overlay, the default behaviour is to remove the original encoded key, but keep_key disables that.

Hope this fixes your issue.

That isn't what I need, that'd presumably keep the 'message' block for all events even if the nested value included a message key or not.

The key being kept is the one from the object inside the message, not the original message itself.

I now see what you're saying. This might be a bug.

Could be that https://github.com/vanilla/fluent-plugin-burrow/blob/master/lib/fluent/plugin/filter_burrow.rb#L88-L93 is moved before the merger.

It still means that if you have a nested key you might overwrite the original value, but not sure theres anything that can be done about that if overlay is chosen.

  def filter(tag, time, record)
    raw_value = record[@key_name]
    if raw_value then
      new_time, new_values = nil, nil
      @parser.parse(raw_value) do |parsed_time, parsed_values|
        new_time   = parsed_time
        new_values = parsed_values
      end

      if new_values then
        original_time = record[@record_time_key]
        new_time ||= original_time

        # Remove the key before we merge the new values in case the merger re-creates the key
        if ['overlay','replace','prefix'].include? @action
          if not @keep_key and record.has_key?(@key_name)
            record.delete(@key_name)
          end
        end

        # Overlay new record on top of original record?
        new_record = case @action
        when 'inplace'
          record.merge({@key_name => new_values})
        when 'overlay'
          record.merge(new_values)
        when 'replace'
          new_values
        when 'prefix'
          record.merge({@data_prefix => new_values})
        end

        # Preserve 'time' key?
        if @keep_time
          new_record[@record_time_key] = original_time
        end

        new_record
      end
    end
  end