dnsmonser not sending all packets from pcap to clickhouse
zeromind opened this issue · 11 comments
I noticed that the number of DNS packets stored in my local clickhouse instance were always multiples of the clickhousebatchsize
:
When using dnsmonster
to process pcap files and clickhousebatchsize
set to non-zero, the clickhouse output will not send all results to clickhouse.
Scenario:
clickhousebatchsize
=100000
(default)- output to clickhouse
- pcaps with number of DNS packets which is not multiple of the batch size, e.g.
1234567
Result:
dnsmonster
sends info for only 1200000
packets to clickhouse, in batches of 100000
.
Missing the 34567
packets as the batch size was not reached.
The code section responsible is
dnsmonster/output/clickhouse.go
Lines 222 to 231 in 928d15f
I did not see anything that makes sure the remainder is being sent to clickhouse when the end of the pcap file has been reached.
Hi,
Yes that's a known limitation of all output batch operations. The previous workaround had a big performance impact so I had to remove it to cater for best possible performance in live packet capture scenarios. For ingesting from pcap
, I suggest you use --clickhousedelay=1s
and see if that solves your issue.
The more comprehensive solution is to "drain" current workloads when the pcap is finished or live interface is not available anymore. Takes a bit of time to implement though..
Thanks for the quick reply,
the workaround with --clickhousedelay=1s
seems be good enough for now, the number of missing packets went down significantly;
for the one system here it's only missing 150k
packets out of 530m
in clickhouse now.
It was missing 95m
before, due to the high number of pcap files (1440/day) and high batch size.
cool. feel free to lower the 1s to even a lower time (=100ms) to see if it becomes less of a problem. let's also keep this open so I can solve it before 1.x
For my use-case of processing many continuous pcap files for a system/day, I found Wireshark's mergecap that can merge/concatinate pcaps.
So I've switched to that, and feeding the concatitinated pcaps to dnsmonster via stdin, with a clickhousebatchsize
of 1000
.
I've not verified this yet, but this should lose at most 999
packets per day here, which is potentially much less than with clickhousedelay
, and does not change depending on processing speed.
command looks something like this: mergecap -F pcap -a -w - *.pcap | ./dnsmonster --noetherframe --pcapfile=- [...]
@zeromind please build the latest commit and have another test. it should fix the problem entirely. let me know if there's any issues or hangups/timeouts.
Unfortunarely, it appears that something is very broken now, dnsmonster gets stuck trying to process a very small pcap file (518 dns packets), even with the file output:
$ packetq --csv -s "select count(*) from dns;" test.20221009.000100
"count(1)"
518
dnsmonster from most recent commit built via docker
$ ./dnsmonster-linux-amd64.bin --version
FATA[0000] dnsmonster version: 80abcc3
$ ./dnsmonster-linux-amd64.bin --loglevel=4 --noetherframe --pcapfile test.20221009.000100 --fileoutputtype=1 --fileoutputpath=/tmp/dnsmonster.foobar.output
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/outputs.go:24 main.setupOutputs() Creating the dispatch Channel
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/output/file.go:46 github.com/mosajjal/dnsmonster/output.fileConfig.Initialize() Creating File Output Channel
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #0
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #1
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/pcapfile.go:31 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() Using File: test.20221009.000100
WARN[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/pcapfile.go:32 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() BPF Filter is not supported in offline mode.
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/nondnstap.go:32 github.com/mosajjal/dnsmonster/capture.captureConfig.StartNonDNSTap() Reading off Pcap file
2022-10-10T13:45:18+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:28+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:38+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:48+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:58+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:08+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:18+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:28+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
^CINFO[2022-10-10T13:46:37+02:00]/opt/dnsmonster/main.go:38 main.handleInterrupt.func1() SIGINT Received. Stopping capture...
2022-10-10T13:46:38+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
FATA[2022-10-10T13:46:39+02:00]/opt/dnsmonster/main.go:48 main.handleInterrupt.func1() emergency exit
The output file is empty, same issue when trying to output to clickhouse; it's not sending anything to clickhouse.
thanks for this. I realized I didn't make the changes for stdout
and file
outputs. can you please try again with the latest commit and let me know.
Thanks :)
Almost the same behaviour with a build of the most recent commit:
$ ./main/aa00061c1b8f5d13f9de0b1966518c0ed8500f70/dnsmonster --loglevel=4 --noetherframe --pcapfile test.20221009.000100 --fileoutputtype
=1 --fileoutputpath=/tmp/dnsmonster.foobar.output
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:24 main.setupOutputs() Creating the dispatch Channel
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/output/file.go:46 github.com/mosajjal/dnsmonster/output.fileConfig.Initialize() Creating File Output Channel
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #0
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #1
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:45 main.setupOutputs() skipping skipDomains refresh since it's not provided
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:54 main.setupOutputs() skipping allowDomains refresh since it's not provided
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/pcapfile.go:31 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() Using File: test.20221009.000100
WARN[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/pcapfile.go:32 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() BPF Filter is not supported in offline mode.
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/nondnstap.go:32 github.com/mosajjal/dnsmonster/capture.captureConfig.StartNonDNSTap() Reading off Pcap file
2022-10-21T14:37:42+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:37:52+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:38:02+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:38:12+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
^CINFO[2022-10-21T14:38:14+02:00]/opt/dnsmonster/main.go:37 main.handleInterrupt.func1() SIGINT Received. Stopping capture...
DEBU[2022-10-21T14:38:14+02:00]/opt/dnsmonster/outputs.go:75 main.setupOutputs.func1() exitting out of output dispatcher
metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
It still does not quit on its own, and still does not find all packets.
But the output file contains 285 out of the 518 packets out of the pcap file.
thanks mate. any chance you can share the pcap
sample with me so I can test. I suspect an issue with noetherframe
option which I haven't tested before.
hopefully this is finally solved today. feel free to re-open if the latest commit still doesn't solve the issue