mosajjal/dnsmonster

dnsmonser not sending all packets from pcap to clickhouse

zeromind opened this issue · 11 comments

I noticed that the number of DNS packets stored in my local clickhouse instance were always multiples of the clickhousebatchsize:
When using dnsmonster to process pcap files and clickhousebatchsize set to non-zero, the clickhouse output will not send all results to clickhouse.

Scenario:

  • clickhousebatchsize = 100000 (default)
  • output to clickhouse
  • pcaps with number of DNS packets which is not multiple of the batch size, e.g. 1234567

Result:
dnsmonster sends info for only 1200000 packets to clickhouse, in batches of 100000.
Missing the 34567 packets as the batch size was not reached.

The code section responsible is

if int(c%chConfig.ClickhouseBatchSize) == div {
now = time.Now()
err = batch.Send()
if err != nil {
log.Warnf("Error while executing batch: %v", err)
clickhouseFailed.Inc(int64(c))
}
c = 0
batch, _ = conn.PrepareBatch(ctx, "INSERT INTO DNS_LOG")
}

I did not see anything that makes sure the remainder is being sent to clickhouse when the end of the pcap file has been reached.

Hi,

Yes that's a known limitation of all output batch operations. The previous workaround had a big performance impact so I had to remove it to cater for best possible performance in live packet capture scenarios. For ingesting from pcap, I suggest you use --clickhousedelay=1s and see if that solves your issue.

The more comprehensive solution is to "drain" current workloads when the pcap is finished or live interface is not available anymore. Takes a bit of time to implement though..

Thanks for the quick reply,
the workaround with --clickhousedelay=1s seems be good enough for now, the number of missing packets went down significantly;
for the one system here it's only missing 150k packets out of 530m in clickhouse now.
It was missing 95m before, due to the high number of pcap files (1440/day) and high batch size.

cool. feel free to lower the 1s to even a lower time (=100ms) to see if it becomes less of a problem. let's also keep this open so I can solve it before 1.x

For my use-case of processing many continuous pcap files for a system/day, I found Wireshark's mergecap that can merge/concatinate pcaps.

So I've switched to that, and feeding the concatitinated pcaps to dnsmonster via stdin, with a clickhousebatchsize of 1000.
I've not verified this yet, but this should lose at most 999 packets per day here, which is potentially much less than with clickhousedelay, and does not change depending on processing speed.

command looks something like this: mergecap -F pcap -a -w - *.pcap | ./dnsmonster --noetherframe --pcapfile=- [...]

#35 addresses this in a more elegant way with 0 packet loss.

@zeromind please build the latest commit and have another test. it should fix the problem entirely. let me know if there's any issues or hangups/timeouts.

Unfortunarely, it appears that something is very broken now, dnsmonster gets stuck trying to process a very small pcap file (518 dns packets), even with the file output:

$ packetq --csv -s "select count(*) from dns;" test.20221009.000100
"count(1)"
518

dnsmonster from most recent commit built via docker

$ ./dnsmonster-linux-amd64.bin --version
FATA[0000] dnsmonster version: 80abcc3
$ ./dnsmonster-linux-amd64.bin --loglevel=4 --noetherframe --pcapfile test.20221009.000100 --fileoutputtype=1 --fileoutputpath=/tmp/dnsmonster.foobar.output
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/outputs.go:24 main.setupOutputs() Creating the dispatch Channel                
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/output/file.go:46 github.com/mosajjal/dnsmonster/output.fileConfig.Initialize() Creating File Output Channel                 
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #0                          
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #1                          
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/pcapfile.go:31 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() Using File: test.20221009.000100             
WARN[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/pcapfile.go:32 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() BPF Filter is not supported in offline mode. 
INFO[2022-10-10T13:45:08+02:00]/opt/dnsmonster/capture/nondnstap.go:32 github.com/mosajjal/dnsmonster/capture.captureConfig.StartNonDNSTap() Reading off Pcap file                        
2022-10-10T13:45:18+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:28+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:38+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:48+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:45:58+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:08+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:18+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-10T13:46:28+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
^CINFO[2022-10-10T13:46:37+02:00]/opt/dnsmonster/main.go:38 main.handleInterrupt.func1() SIGINT Received. Stopping capture...         
2022-10-10T13:46:38+02:00 metrics: {"fileSentToOutput":{"count":0},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
FATA[2022-10-10T13:46:39+02:00]/opt/dnsmonster/main.go:48 main.handleInterrupt.func1() emergency exit

The output file is empty, same issue when trying to output to clickhouse; it's not sending anything to clickhouse.

thanks for this. I realized I didn't make the changes for stdout and file outputs. can you please try again with the latest commit and let me know.

Thanks :)

Almost the same behaviour with a build of the most recent commit:

$ ./main/aa00061c1b8f5d13f9de0b1966518c0ed8500f70/dnsmonster --loglevel=4 --noetherframe --pcapfile test.20221009.000100 --fileoutputtype
=1 --fileoutputpath=/tmp/dnsmonster.foobar.output
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:24 main.setupOutputs() Creating the dispatch Channel                
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/output/file.go:46 github.com/mosajjal/dnsmonster/output.fileConfig.Initialize() Creating File Output Channel                 

INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #0                          
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/packet.go:148 github.com/mosajjal/dnsmonster/capture.captureConfig.StartPacketDecoder() Creating handler #1                          
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:45 main.setupOutputs() skipping skipDomains refresh since it's not provided 
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/outputs.go:54 main.setupOutputs() skipping allowDomains refresh since it's not provided 
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/pcapfile.go:31 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() Using File: test.20221009.000100             
WARN[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/pcapfile.go:32 github.com/mosajjal/dnsmonster/capture.initializeOfflinePcap() BPF Filter is not supported in offline mode. 
INFO[2022-10-21T14:37:32+02:00]/opt/dnsmonster/capture/nondnstap.go:32 github.com/mosajjal/dnsmonster/capture.captureConfig.StartNonDNSTap() Reading off Pcap file                        
2022-10-21T14:37:42+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:37:52+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:38:02+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
2022-10-21T14:38:12+02:00 metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}
^CINFO[2022-10-21T14:38:14+02:00]/opt/dnsmonster/main.go:37 main.handleInterrupt.func1() SIGINT Received. Stopping capture...         
DEBU[2022-10-21T14:38:14+02:00]/opt/dnsmonster/outputs.go:75 main.setupOutputs.func1() exitting out of output dispatcher            
metrics: {"fileSentToOutput":{"count":285},"fileSkipped":{"count":0},"packetLossPercent":{"value":0},"packetsCaptured":{"value":0},"packetsDropped":{"value":0},"packetsDuplicate":{"count":0},"packetsOverRatio":{"count":0}}

It still does not quit on its own, and still does not find all packets.
But the output file contains 285 out of the 518 packets out of the pcap file.

thanks mate. any chance you can share the pcap sample with me so I can test. I suspect an issue with noetherframe option which I haven't tested before.

hopefully this is finally solved today. feel free to re-open if the latest commit still doesn't solve the issue