file logger: buffer still full despite "chan-buffer-size" changes
Closed this issue · 13 comments
Many thanks for your time and effort with go-dnscollector!
We are using go-dnscollector as part of the pDNSSOC project (https://github.com/CERN-CERT/pDNSSOC)
We process incoming dnstap files for a number of resolvers with little/no control over the files size or content.
We feed go-dnscollectors with dnstap files in a directory "watched" by the ingestor. Each file is about 50MB in size.
Unfortunately we suffer from a buffer full, dropping packets error.
We typically loose hundreds of thousands of packets, and our logs are flooded with "buffer full" error messages.
We tried increasing the value of "chan-buffer-size" to a larger value (e.g. 10485760) but it does not change the amount of errors or the amount of lost packets in each error.
I am not sure this is a bug or just a configuration issue, or just us doing it wrong, or what we can do about this.
Any advice welcome.
Current configuration:
- name: dns_server_1
file-ingestor:
enable: true
watch-dir: /var/pdnssoc_input/dns_server_1
watch-mode: dnstap
delete-after: true
chan-buffer-size: 10485760
transforms:
filtering:
log-queries: true
log-replies: true
drop-queryip-file: /etc/dnscollector/dns_servers.txt
Could you share more details ?
- the complete config file
- version used
- and output logs with buffer full messages
Sure! Thanks Denis.
We are using 0.41.0.
Config:
global:
trace:
verbose: true
text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length qname qtype latency"
# default text field delimiter
text-format-delimiter: " "
# default text field boundary
text-format-boundary: "\""
multiplexer:
collectors:
- name: dnstap-dns-server-1
file-ingestor:
enable: true
watch-dir: /var/pdnssoc_input/dns-server-1
watch-mode: dnstap
delete-after: true
chan-buffer-size: 10485760
transforms:
filtering:
log-queries: true
log-replies: true
drop-queryip-file: /root/dns_servers.txt
loggers:
- name: filelogdomains
logfile:
file-path: /var/dnscollector/matches/matches_domains.json
mode: json
transforms:
filtering:
keep-fqdn-file: '/var/dnscollector/misp_domains.txt'
- name: filelogips
logfile:
file-path: /var/dnscollector/matches/matches_ips.json
mode: json
transforms:
filtering:
keep-rdata-file: '/var/dnscollector/misp_ips.txt'
- name: fileall
logfile:
file-path: /var/dnscollector/queries.log
mode: json
max-size: 100
max-files: 100
postrotate-command: "/var/dnscollector/postrotate_query.sh"
postrotate-delete-success: true
routes:
- from: [ dnstap-dns-server-1 ]
to: [ filelogdomains, filelogips, fileall ]
Log files:
INFO: 2024/03/06 19:07:30.196568 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-18:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.196586 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-18:56:53.fstrm]
INFO: 2024/03/06 19:07:30.439693 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-10:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.439721 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-10:56:53.fstrm]
INFO: 2024/03/06 19:07:30.608676 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-02:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.608694 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-02:56:53.fstrm]
ERROR: 2024/03/06 19:07:33.763154 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 364989 packet(s) dropped
The error would repeat over and over:
ERROR: 2024/03/03 04:16:40.594323 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 647689 packet(s) dropped
ERROR: 2024/03/03 04:16:50.602279 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 741568 packet(s) dropped
ERROR: 2024/03/03 04:17:00.726706 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 665186 packet(s) dropped
ERROR: 2024/03/03 04:17:10.824914 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 318581 packet(s) dropped
ERROR: 2024/03/03 04:17:20.825447 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 123588 packet(s) dropped
ERROR: 2024/03/03 04:17:30.895391 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 84076 packet(s) dropped
ERROR: 2024/03/03 04:18:43.438253 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 232958 packet(s) dropped
ERROR: 2024/03/03 04:18:53.484891 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 646237 packet(s) dropped
ERROR: 2024/03/03 04:19:03.654005 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 484388 packet(s) dropped
ERROR: 2024/03/03 04:19:13.786948 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 403251 packet(s) dropped
(Just to give a volume indication, feeding on 34M dnstap file results in " dnstap - logger[fileall] buffer is full, 2003 packet(s) dropped" with the configuration file from the previous message)
In your case, the bottleneck is the fileall
logger, you can see that with logger[fileall] buffer is full, 647689 packet(s) dropped
.This message indicates that the buffer of the fileall
logger is full.
It occured because this logger is too slow regarding the amount of incoming data. To avoid to block the collector processing, the overhead of packets are dropped.
Could you try some adjustments
- If you remove the postrotate command, the drop of packet still occurs ?
- Try to increase the buffer of the logger
fileall
? Be carefull with this parameter (big impact on memory usage)
chan-buffer-size: 131070
Good thinking, thank you!
After testing, it turns out that:
- The postrotate script is causing the issue -- Disabling it solves the dropped packet issue.
- The buffer size of the logger
fileall
does not play a role. In fact, the default 65535 is sufficient.
For the record, the postrotate script we use is:
#!/bin/bash
BACKUP_FOLDER=/var/dnscollector/queries/$(date +%Y-%m-%d)
mkdir -p $BACKUP_FOLDER
FILE_NAME=$(basename $1 .log)
jq -c '. | {timestamp: .dnstap."timestamp-rfc3339ns", query: .dns.qname, client: .network."query-ip", server: .network."response-ip", client_id: .dnstap.identity , answers: .dns."resource-records".an }' $1 > $BACKUP_FOLDER/$FILE_NAME.json && gzip -S .gz_minified $BACKUP_FOLDER/$FILE_NAME.json
I will continue to investigate what is causing the bottleneck there and update this ticket with any possible solution.
@arvchristos @romainw The best way will be that the pdnssoc-cli uses natively the 'flat-json' model generated by DNScollector to avoid a new JSON generation from an external script. Also the file
logger support gzip
compression.
I think that the pDNSSOC and DNScollector integration can be greatly improved to support a higher load :)
Hey @dmachard !
The main reason why we relied on an external script for JSON generation is to be able to store only the required for correlation fields and save on size. I am not sure whether there is any transformer to remove fields from the JSON output but it seems useful for space optimization (combined with gzip
of course as you correctly pointed out).
We essentially need to store minified logs so that we can run correlation on past data with new indicators of compromise, apart from the live correlation we do with various other sources that go-dnscollector
supports.
I am not sure whether there is any transformer to remove fields from the JSON output
Great idea, can you open a ticket to track this?
I am not sure whether there is any transformer to remove fields from the JSON output
Great idea, can you open a ticket to track this?
Sure, here it is:
Custom JSON relabeling has been implemented. It's will be implemented in next release v0.43.0.
The documentation is here.
Perhaps it can be used to improved performance.
Hi @dmachard
Im seeing similar issue with v1.0.0 while writing it to a local file. if you can suggest any tweak is needed in the config file.
i have tried increasing this values chan-buffer-size: 0 to 30720
`global:
trace:
verbose: true
pipelines:
-
name: tap
dnstap:
sock-path: /var/run/dnscollector/dnstap.sock
routing-policy:
forward: [ file ] -
name: file
logfile:
file-path: "/var/log/dnscollector/dnstap.log"
compress: true
max-size: 30720
max-files: 7
mode: json
chan-buffer-size: 0
text-format: "timestamp identity qr operation rcode queryip queryport responseip responseport family protocol length qname qtype latency ttl"`
Oct 07 18:06:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:16.724111 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 1596 dnsmessage(s) dropped
Oct 07 18:06:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:26.724988 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6556 dnsmessage(s) dropped
Oct 07 18:06:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:36.725535 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6724 dnsmessage(s) dropped
Oct 07 18:06:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:46.726453 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5972 dnsmessage(s) dropped
Oct 07 18:06:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:56.726685 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 7065 dnsmessage(s) dropped
Oct 07 18:07:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:06.727064 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6029 dnsmessage(s) dropped
Oct 07 18:07:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:16.727741 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5829 dnsmessage(s) dropped
Oct 07 18:07:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:26.728751 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6174 dnsmessage(s) dropped
Oct 07 18:07:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:36.729660 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5778 dnsmessage(s) dropped
Oct 07 18:07:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:46.730720 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6140 dnsmessage(s) dropped
Oct 07 18:07:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:56.731152 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6053 dnsmessage(s) dropped
Oct 07 18:08:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:06.732230 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5895 dnsmessage(s) dropped
Oct 07 18:08:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:16.732329 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6083 dnsmessage(s) dropped
Oct 07 18:08:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:26.732412 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6050 dnsmessage(s) dropped
Oct 07 18:08:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:36.732935 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5488 dnsmessage(s) dropped
Oct 07 18:08:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:46.733642 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5914 dnsmessage(s) dropped
Oct 07 18:08:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:56.733756 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 7116 dnsmessage(s) dropped
Oct 07 18:09:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:06.733896 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5845 dnsmessage(s) dropped
Oct 07 18:09:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:16.733973 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5587 dnsmessage(s) dropped
Oct 07 18:09:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:26.734995 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5570 dnsmessage(s) dropped
Oct 07 18:09:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:36.735098 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5593 dnsmessage(s) dropped
Oct 07 18:09:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:46.735690 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5701 dnsmessage(s) dropped
Oct 07 18:09:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:56.736378 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5875 dnsmessage(s) dropped
Oct 07 18:10:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:10:06.736505 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5680 dnsmessage(s) dropped
Oct 07 18:10:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:10:16.737712 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 4175 dnsmessage(s) dropped
Thanks.
Hi @arvchristos @romainw,
I pushed a fix in the head of the repository to ensure the post-rotate command no longer blocks execution. I think this should be helpful for you!
Fixed in release 1.2.0. If you encounter any further issues, please feel free to open a new ticket. Thank you for your feedback!