magnusbaeck/logstash-filter-verifier

@timestamp erroneously added to events when omitted

jgough opened this issue · 6 comments

If the @timestamp field is removed (for example with mutate { remove_field => [ "@timestamp" ] }) then LFV adds it back in making tests fail. Tested on Beta 2 in daemon mode

Config:

input { stdin { id => "input" } }
filter { mutate { remove_field => [ "@timestamp" ] } }
output { stdout { } }

Testcase:

input_plugin: "input"
testcases:
  - input:
      - Test
    expected:
      - message: Test

Result:

☐ Comparing message 1 of 1 from test.yml:
--- /tmp/937366809/test.yml/1/expected  2021-10-12 15:30:42.730339850 +0000
+++ /tmp/937366809/test.yml/1/actual    2021-10-12 15:30:42.730339850 +0000
@@ -1,3 +1,4 @@
 {
+  "@timestamp": "2021-10-12T15:30:42.641Z",
   "message": "Test"
 }

Summary: ☐ All tests: 0/1
         ☐ test.yml: 0/1

When run directly through logstash the above filter does not output a @timestamp field.

Dockerfile to repro (including displaying logstash's output without the @timestamp field):

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.15.0

ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.2

USER root
RUN yum clean expire-cache && yum update -y && yum install curl && yum clean all
ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
    && mv /opt/logstash-filter-verifier /usr/bin/

USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline
  path.config: "pipeline/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test.yml
input_plugin: "input"
testcases:
  - input:
      - Test
    expected:
      - message: Test
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline.conf
input { stdin { id => "input" } }
filter { mutate { remove_field => [ "@timestamp" ] } }
output { stdout { } }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Running directly through Logstash"
echo "Test" | logstash -f /usr/share/logstash/pipeline/pipeline.conf

echo "Starting daemon..."
logstash-filter-verifier daemon start &
sleep 5
logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test.yml --add-missing-id
EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

Build and run the above Dockerfile with:

DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test
breml commented

This one is interesting. I searched the source of LFV and I was not able to find a location, where LFV operates on the @timestamp field.

Could it be, that the @timestamp field gets added by Logstash on the pipeline input plugin. Due to the fact that we use multiple pipelines in LFV 2, the event passes an other input plugin (the pipeline to pipeline input) after the filter-pipeline-under-test and this input plugin adds the @timestamp field again?

breml commented

To me, it looks like Logstash is adding the timestamp in the pipeline-to-pipeline communication.

This can be shown with the following Dockerfile:

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.15.0

ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.2

USER root
RUN yum clean expire-cache && yum update -y && yum install curl && yum clean all
ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
    && mv /opt/logstash-filter-verifier /usr/bin/

USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/pipeline.conf"
- pipeline.id: pipeline2
  path.config: "pipeline/pipeline2/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test.yml
input_plugin: "input"
testcases:
  - input:
      - Test
    expected:
      - message: Test
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/pipeline.conf
input { stdin { id => "input" } }
filter { mutate { remove_field => [ "@timestamp" ] } }
output { pipeline { send_to => [ pipeline2 ] } }
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline2/pipeline.conf
input { pipeline { address => pipeline2 } }
output { stdout { } }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Running directly through Logstash"
echo "Test" | logstash &
sleep 30
pkill logstash

echo "Starting daemon..."
logstash-filter-verifier daemon start &
sleep 5
logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test.yml --add-missing-id
EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

I am not sure, if this can be considered a bug in LFV. I tend to argue, that this is a bug in Logstash it self.

AFAIK the @timestamp field is special in that it isn't an attribute of the Event class and not part of the regular hashmap of fields (similar to how the @metadata fields are in a separate hashmap). Unless you have cooperation from the output plugin I don't think you can omit the @timestamp field.

breml commented

I created elastic/logstash#13333, let us see, what the Logstash devs have to say about this issue.

breml commented

Currently I see little chances, that this gets fixed upstream in Logstash.
But I also don't see a way to fix this in LFV, because using pipelines to execute the test cases is a key part of the new architecture for the daemon mode.

Therefore me personally, I would close this as "won't fix" and maybe the best thing we can do here is mention this somehow in the documentation.

@jgough, @magnusbaeck What do you think?

Annoying, but seems reasonable. Easy to work around by ignoring @timestamp.