innogames/graphite-ch-optimizer

SQL driver produces unwanted output to stdout despite --output value

Closed this issue · 10 comments

Maybe not the correct place to raise this issue, but even when I set --log-level or --output, I still get a lot of debug info on stdout:

[clickhouse]host(s)=localhost:9000, database=default, username=default
[clickhouse][dial] secure=false, skip_verify=false, strategy=random, ident=1, server=0 -> 127.0.0.1:9000
[clickhouse][connect=1][hello] -> Golang SQLDriver 1.1.54213
[clickhouse][connect=1][hello] <- ClickHouse 20.8.54438 (Europe/Brussels)
[clickhouse][connect=1]-> ping
[clickhouse][connect=1][process] <- pong
[clickhouse][connect=1][prepare]
SELECT
	concat('`', p.database, '`.`', p.table, '`') AS table,
	p.partition_id AS partition_id,
	p.partition AS partition_name,
	max(g.age) AS age,
	countDistinct(p.name) AS parts,
	toDateTime(max(p.max_date + 1)) AS max_time,
	max_time + age AS rollup_time,
	min(p.modification_time) AS modified_at
FROM system.parts AS p
INNER JOIN
(
	SELECT
		Tables.database AS database,
		Tables.table AS table,
		age
	FROM system.graphite_retentions
	ARRAY JOIN Tables
	GROUP BY
		database,
		table,
		age
) AS g ON (p.table = g.table) AND (p.database = g.database)
-- toDateTime(p.max_date + 1) + g.age AS unaggregated rollup_time
WHERE p.active AND ((toDateTime(p.max_date + 1) + g.age) < now())
GROUP BY
	table,
	partition_name,
	partition_id
-- modified_at < rollup_time: the merge has not been applied for the current retention policy
-- parts > 1: merge should be applied because of new parts
-- modified_at < (now() - @Interval): we want to merge active partitions only once per interval,
--   so do not touch partitions with current active inserts
HAVING ((modified_at < rollup_time) OR (parts > 1))
	AND (modified_at < (now() - @Interval))
ORDER BY
	table ASC,
	partition_name ASC,
	age ASC
[clickhouse][connect=1][send query]
SELECT
	concat('`', p.database, '`.`', p.table, '`') AS table,
	p.partition_id AS partition_id,
	p.partition AS partition_name,
	max(g.age) AS age,
	countDistinct(p.name) AS parts,
	toDateTime(max(p.max_date + 1)) AS max_time,
	max_time + age AS rollup_time,
	min(p.modification_time) AS modified_at
FROM system.parts AS p
INNER JOIN
(
	SELECT
		Tables.database AS database,
		Tables.table AS table,
		age
	FROM system.graphite_retentions
	ARRAY JOIN Tables
	GROUP BY
		database,
		table,
		age
) AS g ON (p.table = g.table) AND (p.database = g.database)
-- toDateTime(p.max_date + 1) + g.age AS unaggregated rollup_time
WHERE p.active AND ((toDateTime(p.max_date + 1) + g.age) < now())
GROUP BY
	table,
	partition_name,
	partition_id
-- modified_at < rollup_time: the merge has not been applied for the current retention policy
-- parts > 1: merge should be applied because of new parts
-- modified_at < (now() - 259200): we want to merge active partitions only once per interval,
--   so do not touch partitions with current active inserts
HAVING ((modified_at < rollup_time) OR (parts > 1))
	AND (modified_at < (now() - 259200))
ORDER BY
	table ASC,
	partition_name ASC,
	age ASC
[clickhouse][connect=1][query settings] optimize_throw_if_noop=1&receive_timeout=3600
[clickhouse][connect=1][read meta] <- data: packet=1, columns=8, rows=0
[clickhouse][connect=1][rows] <- profiling: rows=0, bytes=0, blocks=0
[clickhouse][connect=1][rows] <- progress: rows=130, bytes=64998, total rows=0
[clickhouse][connect=1][rows] <- data: packet=1, columns=0, rows=0, elapsed=4.951µs
[clickhouse][connect=1][rows] <- progress: rows=0, bytes=0, total rows=0
[clickhouse][connect=1][rows] <- end of stream
[clickhouse][connect=1][rows] close
[clickhouse][connect=1][stmt] close

I guess this is something that can be set on the SQL driver, if not, this might be forwarded to the clickhouse driver?

Hey Thomas. That's something you could easily get rid of with overriding the DSN parameter. It has the default value tcp://localhost:9000?&optimize_throw_if_noop=1&receive_timeout=3600&debug=true with debug=true. Just set your own w/o debug in a config file.

It's interesting that --output doesn't affect the driver debug info, though. I'll take a look at it for sure.

Thanks for the report.

I tested with --output /dev/null and still got this output, I will try with modified DSN now.

debug=false indeed solves this, but why is it by default enabled? Can that be changed?

Or otherwise, please add a default config when installing rpm (and let the systemd service use it)

Well, the default config is provided in neither RPM nor DEB packages. It's pretty much the same as default values, so there is /etc/graphite-ch-optimizer/config.toml.example in both to not mess up with user's configuration files during the upgrade. You could use it as a template and change everything to fit your requirements, for example, lower the optimize-interval and alter the server-dsn string.

When one uses the package first time, it's pretty useful to see, what's going on behind the scene. I think default debug=true is the reasonable option for the first shots.

I'll try to fix the behavior of the driver though since it should respect the --output value.

I don't know for deb, but rpm does not overwrite config files, it creates a second one for example config.tom.rpmnew. The problem actually is that the provided systemd.service does not specifies to use a config file.. Changing that might cause problems if rpm is set to overwrite it on upgrade.

Ah, I get the concern. The default value for config file is /etc/graphite-ch-optimizer/config.toml. Unfortunately, it's not so obvious from https://github.com/innogames/graphite-ch-optimizer/blob/f3281ad848/graphite-ch-optimizer.go#L231-L239. It's something worth adding to --help output, for sure.

You could see all files the binary tries to open in the output of strace -e file graphite-ch-optimizer command.

upd: there's a branch issue13, you could see, at least docstring is fixed there Filename of the custom config. CLI arguments override it (default "/etc/graphite-ch-optimizer/config.toml")

Ooh okay, then there is no problem anymore. Thanks!

The only problem is still that the driver does not respect the output setting.

Should be solved in the branch issue13. Could you check, please?

Looks good.