logstash-plugins/logstash-codec-netflow

One IPFIX source = OK, but more than one leads to corrupt fields (confused templates)

regulatre opened this issue · 0 comments

Logstash 8.8.2 on Linux x86_64, (container: docker.elastic.co/logstash/logstash:8.8.2)

I'm invoking the container and mapping the pipeline directory to my test directory with one simple config file present:

Sample invocation:
$ docker run -it -p 9996:9996/udp -v /opt/docker-logstash-receivers/pipelines:/usr/share/logstash/pipeline/ docker.elastic.co/logstash/logstash:8.8.2

My Logstash config/pipeline file

input { 

  udp {
    port                 => 1234
    codec                => netflow
    receive_buffer_bytes => 16777216
    workers              => 2
  }
}

filter { }

output { 
  stdout { codec => rubydebug }
}

I'm transmitting IPFIX from a handful of Mikrotik routers to the above pipeline. The packets are being received, and I see output showing up in the rubydebug output, however unexpectedly, some messages are corrupt... very corrupt. I know they are corrupt because the interface number will show as a very large nonsensical number. I only observe this corruption behavior when I have more than one router transmitting IPFIX to the pipeline. Packets are reaching the container free of NAT (so their original IP address is intact). Expected behavior would be that Logstash keeps a separate template for each source, and applies each source IP's template to packets from that source IP.

Using wireshark, I have confirmed that the corruption occurs after one of the routers sends a new template refresh, and then an IPFIX record from a different router arrives. Logstash appears to be mixing up different router templates, perhaps only using a single template.

In other words, it looks like the codec is not differentiating between IPFIX templates from multuple sources, rather it seems to only store one template, refreshing it with whichever router's template that arrives next, and applying that same template to all subsequent netflow from any source, instead of applying each source's template to that source's ipfix packets.

Sample corrupted message, below. basically every MAC, IP, interface# etc field is corrupt, and contains completely invalid information. This only started after I enabled IPFIX on more than one router.

{
          "host" => {
        "ip" => "<I have omitted this IP>"
    },
      "@version" => "1",
    "@timestamp" => 2023-07-03T21:58:18.000Z,
       "netflow" => {
                  "sourceIPv4PrefixLength" => 110,
                        "flowEndSysUpTime" => 773205508,
                    "postSourceMacAddress" => "00:11:00:00:ff:ff",
                           "ipTotalLength" => 1099511627775,
                "destinationTransportPort" => 392,
                        "ingressInterface" => 4178953914,
                "postNATSourceIPv4Address" => "5.0.0.0",
                      "protocolIdentifier" => 0,
                            "icmpCodeIPv4" => 0,
             "postNAPTSourceTransportPort" => 38400,
                             "isMulticast" => 99,
                          "ipHeaderLength" => 0,
                                 "version" => 10,
                                "igmpType" => 0,
                               "ipVersion" => 255,
                       "sourceIPv4Address" => "110.189.70.3",
        "postNAPTDestinationTransportPort" => 33280,
                  "destinationIPv4Address" => "0.0.0.0",
                        "ipClassOfService" => 0,
                           "tcpWindowSize" => 65280,
               "postDestinationMacAddress" => "96:16:2e:11:2e:00",
                         "egressInterface" => 1,
                      "flowStartSysUpTime" => 4294967062,
           "postNATDestinationIPv4Address" => "0.0.0.0",
                        "packetDeltaCount" => 615348170,
                   "destinationMacAddress" => "00:00:0a:00:00:00",
                    "ipNextHopIPv4Address" => "0.0.220.44",
                         "octetDeltaCount" => 615348170,
             "destinationIPv4PrefixLength" => 74,
                                   "ipTTL" => 158,
                        "udpMessageLength" => 65535,
                            "icmpTypeIPv4" => 64,
                          "tcpControlBits" => 0,
                     "sourceTransportPort" => 0,
                        "sourceMacAddress" => "ff:ff:ff:ff:dc:2c"
    }

To reproduce this behavior, use the logstash configuration I shared above, start the container with the commandline I shared (which bridges the interface to avoid NAT which would lose the source IP). Then configure a router to transmit IPFIX to the server running logstash. Observe normal IPFIX packets being shown to stdout. Then enable IPFIX on a second router and point it to the logstash server. Now observe that NETFLOW messages printed by logstash in stdout show corrupt fields, like that in the example shown above. From what I can tell, about 5-20% of the messages are getting lost due to corruption.

Thank you
Brad