logstash-plugins/logstash-filter-grok

Behaviour when pattern writes to same input field (without "overwrite" option)

hackery opened this issue · 1 comments

The behaviour of a grok where a semantic is named the same as the input field is not described in the documentation, and is counter-intuitive. e.g.:

input {
  generator {
    "message" => "hello world"
    "count" => 1
  }
}

filter {
  grok {
    match => { "message" => "hello %{GREEDYDATA:message}" }
  }
}

output { stdout{} }

If you didn't read the docs carefully, you might assume that message gets overwritten with world (perhaps the most obvious behaviour). Reading the overwrite section, you'd probably think OK, if I want that behaviour, I need to set overwrite but otherwise it looks like it's discarded.

What you actually get is an array, which is, I feel, an unwelcome surprise. The code explicitly promotes an existing string to an array and appends to it. If it was already an array, it appends to it - that's fair enough.

{
    ...
    "message" => [
        [0] "hello world",
        [1] "world"
    ],
}

Please note behaviour in documentation, or if it's not actually intended, consider a bug.
It looks like there in fact isn't a way to specify retaining (only) the original string field.

This behavior is consistent with add_field as well. If you try to add a field that already exists it will be added as an array. That particular issue is being tracked in elastic/logstash#11751