Clickhouse schema weirdness
Opened this issue · 2 comments
I've noticed that on a number of the CH tables you are including timestamp
in the order by. Rather than doing this you should probably have a truncated timestamp such as by minute (or at least truncated to per-second) otherwise there's not much point in the MV's compared to just sampling from the raw table itself.
Additionally there are a number of times when you sum()
a value such as DoBit
which is a UInt8
in the primary table. It would be better to cast those to UInt64
and then sum that to avoid overflows.
a lot of these tables have changed over time and they're overdue for a revamp and cleanup. happy to accept a PR for these :)
Probably related to the issue I'm getting.
I just installed dnsmonster + clickhouse in a docker container on a separate host and created all tables according to the schema in this repo.
Now when running dnsmonster with output to clickhouse enabled I'm getting:
server:~# dnsmonster --config /etc/dnsmonster/dnsmonster.ini
INFO[2023-06-03T19:45:09+01:00] Starting DNStap capture
INFO[2023-06-03T19:45:09+01:00] listening on DNStap socket unix:///tmp/dnstap.sock
INFO[2023-06-03T19:45:09+01:00] socket exists, will try to overwrite the socket
INFO[2023-06-03T19:45:09+01:00] Creating the dispatch Channel
INFO[2023-06-03T19:45:09+01:00] Creating Clickhouse Output Channel
INFO[2023-06-03T19:45:09+01:00] Creating Elastic Output Channel
INFO[2023-06-03T19:45:09+01:00] skipping skipDomains refresh since it's not provided
INFO[2023-06-03T19:45:09+01:00] skipping allowDomains refresh since it's not provided
INFO[2023-06-03T19:45:09+01:00] Creating handler #0
INFO[2023-06-03T19:45:09+01:00] Creating handler #1
INFO[2023-06-03T19:45:09+01:00] Elasticsearch returned with code 200 and version 8.6.2
INFO[0004] /tmp/dnstap.sock: accepted connection 1
WARN[2023-06-03T19:45:19+01:00] failed to convert metrics to JSON.
2023-06-03T19:45:19+01:00 metrics:
INFO[2023-06-03T19:45:19+01:00] ipv4 flushed: 0, closed: 0
INFO[2023-06-03T19:45:19+01:00] ipv6 flushed: 0, closed: 0
panic: runtime error: index out of range [0] with length 0
goroutine 11 [running]:
github.com/ClickHouse/clickhouse-go/v2/lib/column.IPv6ToBytes({0x0?, 0x10?, 0xc000122030?})
/root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/column/ipv6.go:247 +0x2cf
github.com/ClickHouse/clickhouse-go/v2/lib/column.(*IPv6).AppendRow(0xc000040810, {0x122eee0?, 0x22e1300?})
/root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/column/ipv6.go:215 +0x345
github.com/ClickHouse/clickhouse-go/v2/lib/proto.(*Block).Append(0xc000583d00, {0xc0002b4c60?, 0x11, 0x10?})
/root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/lib/proto/block.go:62 +0x1a7
github.com/ClickHouse/clickhouse-go/v2.(*batch).Append(0xc000098280, {0xc0002b4c60?, 0x0?, 0x1c90b58?})
/root/go/pkg/mod/github.com/!click!house/clickhouse-go/v2@v2.5.0/conn_batch.go:122 +0x48
github.com/mosajjal/dnsmonster/internal/output.clickhouseConfig.clickhouseOutputWorker({{0xc0005f51c0, 0x1, 0x1}, {0xc000510cd5, 0x7}, {0xc0001656d5, 0x20}, {0xc000510cf5, 0xa}, 0x0, ...}, ...)
/opt/dnsmonster/internal/output/clickhouse.go:198 +0xdb7
github.com/mosajjal/dnsmonster/internal/output.clickhouseConfig.Output.func1()
/opt/dnsmonster/internal/output/clickhouse.go:148 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/root/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
/root/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5
Not much information in the output but my guess is the schema is problematic? When I run the SELECT DISTINCT Question
example that was commented below the schema for tables it was complaining about the timestamp column not existing/defined. Is there any more up to date schema or setup I can use to deploy? I'm not that familiar with clickouse as I'm now just trying to switch from ES to clickhouse due to high load/data.