innogames/graphite-ch-optimizer

Understanding the purpose of this project

Closed this issue · 16 comments

Sorry for dumb question but I am not able to understand the purpose of this project. I came across this project while searching about retention in clickhouse. Would it help me to solve the concern I mentioned in ClickHouse/ClickHouse#20087

Hey. Not sure which one are you mentioninig.

If the latest one about the way to drop metrics behind some age, then definitely no. The project solves only one problem: it optimizes the partitions, that crossed some particular age and must be optimized.

Regarding your question in ClickHouse/ClickHouse#20087 (comment), I have an idea implemented in our system, I'll share it in the issue

@ervikrant06 is there anything else I can help you?

@Felixoid I am very thankful your input in that github comment.

If I am correctly understanding purpose of this project can be useful for the use where in-active paritions are not getting removed? Inactive partitions indicate the partitions which are merged according to rollup policy, but didn't get deleted from system?

clickhouse2-0.clickhouse-hs.ns-vaggarwal.svc.cluster.local :) select COUNT(name) from parts where table='graphite' AND active = 0

SELECT COUNT(name)
FROM parts
WHERE (table = 'graphite') AND (active = 0)

Query id: 9068aa99-f9be-49fc-8acd-3591c07080e1

┌─COUNT(name)─┐
│          36 │
└─────────────┘

1 rows in set. Elapsed: 0.003 sec.

clickhouse2-0.clickhouse-hs.ns-vaggarwal.svc.cluster.local :) select COUNT(name) from parts where table='graphite' AND active = 1

SELECT COUNT(name)
FROM parts
WHERE (table = 'graphite') AND (active = 1)

Query id: cb834734-ae75-422c-a01c-8c16aa502146

┌─COUNT(name)─┐
│           5 │
└─────────────┘

1 rows in set. Elapsed: 0.002 sec.

use where in-active paritions are not getting removed?

No, nothing related

Inactive partitions indicate the partitions which are merged according to rollup policy, but didn't get deleted from system?

The setting old_parts_lifetime is responsible for what you mean.

I'll try to rephrase the project's purpose:
The graphite_rollup retentions are applied to the data ONLY during the merge process. But ClickHouse doesn't launch background merges for stale data. So your policies are never applied. graphite-ch-optimizer watches for partitions that may be optimized for freeing the disk space.

thanks for the information. I was trying to build the docker image for graphite-ch-optimizer using dockerfile located at graphite-ch-optimizer/docker/graphite-ch-optimizer and encountered this error.

# docker build -t graphite-ch-optimizer:1.2.1 .
Sending build context to Docker daemon   2.56kB
Step 1/10 : FROM innogames/graphite-ch-optimizer:builder AS build
builder: Pulling from innogames/graphite-ch-optimizer
df20fa9351a1: Already exists
ed8968b2872e: Pull complete
a92cc7c5fd73: Pull complete
dd0b68742591: Downloading [==================>                                ]  47.82MB/127.3MB
dd0b68742591: Pull complete
fb4ae7ba1102: Pull complete
d96f3cc82358: Pull complete
Digest: sha256:619d51e882bae4af38ef8b849798bcf338505bc9fc1e0f372b3ac02cfb8bddd5
Status: Downloaded newer image for innogames/graphite-ch-optimizer:builder
 ---> c5025cfd1059
Step 2/10 : COPY . ./graphite-ch-optimizer
 ---> 9edcdfbcf325
Step 3/10 : WORKDIR ./graphite-ch-optimizer
Removing intermediate container 69c2e7782a5f
 ---> 66b55e480a32
Step 4/10 : RUN make test &&     make -e CGO_ENABLED=0 build &&     make -e CGO_ENABLED=0 packages
 ---> Running in 30e0e087f8b9
make: *** No rule to make target 'test'.  Stop.
The command '/bin/sh -c make test &&     make -e CGO_ENABLED=0 build &&     make -e CGO_ENABLED=0 packages' returned a non-zero code: 2

I tried to use the default latest image available.

[clickhouse]host(s)=clickhouse:9000, database=default, username=default
[clickhouse][dial] secure=false, skip_verify=false, strategy=random, ident=1, server=0 -> 192.168.10.254:9000
[clickhouse][connect=1][hello] -> Golang SQLDriver 1.1.54213
[clickhouse][connect=1][hello] <- ClickHouse 21.1.54443 (UTC)
[clickhouse][connect=1]-> ping
[clickhouse][connect=1][process] <- pong
[clickhouse][connect=1][prepare]
SELECT
        concat('`', p.database, '`.`', p.table, '`') AS table,
        p.partition_id AS partition_id,
        p.partition AS partition_name,
        max(g.age) AS age,
        countDistinct(p.name) AS parts,
        toDateTime(max(p.max_date + 1)) AS max_time,
        max_time + age AS rollup_time,
        min(p.modification_time) AS modified_at
FROM system.parts AS p
INNER JOIN
(
        SELECT
                Tables.database AS database,
                Tables.table AS table,
                age
        FROM system.graphite_retentions
        ARRAY JOIN Tables
        GROUP BY
                database,
                table,
                age
) AS g ON (p.table = g.table) AND (p.database = g.database)
-- toDateTime(p.max_date + 1) + g.age AS unaggregated rollup_time
WHERE p.active AND ((toDateTime(p.max_date + 1) + g.age) < now())
GROUP BY
        table,
        partition_name,
        partition_id
-- modified_at < rollup_time: the merge has not been applied for the current retention policy
-- parts > 1: merge should be applied because of new parts
-- modified_at < (now() - @Interval): we want to merge active partitions only once per interval,
--   so do not touch partitions with current active inserts
HAVING ((modified_at < rollup_time) OR (parts > 1))
        AND (modified_at < (now() - @Interval))
ORDER BY
        table ASC,
        partition_name ASC,
        age ASC
[clickhouse][connect=1][send query]
SELECT
        concat('`', p.database, '`.`', p.table, '`') AS table,
        p.partition_id AS partition_id,
        p.partition AS partition_name,
        max(g.age) AS age,
        countDistinct(p.name) AS parts,
        toDateTime(max(p.max_date + 1)) AS max_time,
        max_time + age AS rollup_time,
        min(p.modification_time) AS modified_at
FROM system.parts AS p
INNER JOIN
(
        SELECT
                Tables.database AS database,
                Tables.table AS table,
                age
        FROM system.graphite_retentions
        ARRAY JOIN Tables
        GROUP BY
                database,
                table,
                age
) AS g ON (p.table = g.table) AND (p.database = g.database)
-- toDateTime(p.max_date + 1) + g.age AS unaggregated rollup_time
WHERE p.active AND ((toDateTime(p.max_date + 1) + g.age) < now())
GROUP BY
        table,
        partition_name,
        partition_id
-- modified_at < rollup_time: the merge has not been applied for the current retention policy
-- parts > 1: merge should be applied because of new parts
-- modified_at < (now() - 259200): we want to merge active partitions only once per interval,
--   so do not touch partitions with current active inserts
HAVING ((modified_at < rollup_time) OR (parts > 1))
        AND (modified_at < (now() - 259200))
ORDER BY
        table ASC,
        partition_name ASC,
        age ASC
[clickhouse][connect=1][query settings] optimize_throw_if_noop=1&receive_timeout=3600
[clickhouse][connect=1][read meta] <- data: packet=1, columns=8, rows=0
[clickhouse][connect=1][rows] <- profiling: rows=0, bytes=0, blocks=0
[clickhouse][connect=1][rows] <- progress: rows=67, bytes=29051, total rows=0
[clickhouse][connect=1][rows] <- data: packet=1, columns=0, rows=0, elapsed=1.605µs
[clickhouse][connect=1][rows] <- progress: rows=0, bytes=0, total rows=0
[clickhouse][connect=1][rows] <- end of stream
[clickhouse][connect=1][rows] close
[clickhouse][connect=1][stmt] close
time="2021-02-12 10:29:02 UTC" level=info msg="Merges will be applied: 0"
time="2021-02-12 10:29:02 UTC" level=info msg="Optimizations round is over, going to sleep for 1h0m0s"

May be I am bit impatient but didn't seem to change the retention of metrics older than 7 days. they are still with 1min precision.

Here's a way to build an image https://github.com/innogames/graphite-ch-optimizer#docker

time="2021-02-12 10:29:02 UTC" level=info msg="Merges will be applied: 0"

So, you don't have stale partitions. Works perfectly

Take a look at queries from https://github.com/innogames/graphite-ch-optimizer#details, please. They can give you some insights on would or wouldn't your partitions be optimized.

The comments in the query logs tell, that there no partitions satisfying the conditions:

-- modified_at < rollup_time: the merge has not been applied for the current retention policy
-- parts > 1: merge should be applied because of new parts
-- modified_at < (now() - 259200): we want to merge active partitions only once per interval,
--   so do not touch partitions with current active inserts

So, what you may do:

  1. Launch the top query from https://github.com/innogames/graphite-ch-optimizer#details, it shows if there are partitions to optimize w/o taking --optimize-interval into account
  2. Most probably, you'll see the table like:
┌─table────────────┬─partition_id─┬─partition─┬──age─┬─parts─┬────────────max_time─┬─────────rollup_time─┬─────────modified_at─┐
│ graphite.data_lr │ 20210222     │ 20210222  │ 3600 │     2 │ 2021-02-23 00:00:00 │ 2021-02-23 01:00:00 │ 2021-02-24 10:03:28 │
│ graphite.data_lr │ 20210223     │ 20210223  │ 3600 │     2 │ 2021-02-24 00:00:00 │ 2021-02-24 01:00:00 │ 2021-02-24 10:03:35 │
└──────────────────┴──────────────┴───────────┴──────┴───────┴─────────────────────┴─────────────────────┴─────────────────────┘
  1. The column you are interested in is modified_at. As you see, in my case it's around 4-5 hours ago. That means, I can tweak --optimize-interval=2h, and then the optimization will work.

First query didn't return anything.

Output from the query issued at Wed 24 Feb 2021 04:00:01 PM UTC.

SELECT
    partition,
    name,
    modification_time
FROM system.parts
WHERE active = 0
LIMIT 10

Query id: 8c2c32c7-c0c7-4c9a-8461-431e7f172a9f

┌─partition─┬─name──────────────────────┬───modification_time─┐
│ 197002    │ 197002_348214_350229_1489 │ 2021-02-24 15:51:21 │
│ 197002    │ 197002_348214_350230_1490 │ 2021-02-24 15:51:25 │
│ 197002    │ 197002_348214_350231_1491 │ 2021-02-24 15:51:30 │
│ 197002    │ 197002_348214_350232_1492 │ 2021-02-24 15:51:30 │
│ 197002    │ 197002_348214_350233_1493 │ 2021-02-24 15:51:35 │
│ 197002    │ 197002_348214_350234_1494 │ 2021-02-24 15:51:36 │
│ 197002    │ 197002_348214_350235_1495 │ 2021-02-24 15:51:36 │
│ 197002    │ 197002_348214_350236_1496 │ 2021-02-24 15:51:37 │
│ 197002    │ 197002_348214_350237_1497 │ 2021-02-24 15:51:38 │
│ 197002    │ 197002_348214_350238_1498 │ 2021-02-24 15:51:39 │
└───────────┴───────────────────────────┴─────────────────────┘

What's the output of the query?

SELECT
    database,
    table,
    count() AS parts,
    active,
    partition,
    min(min_date) AS min_date,
    max(max_date) AS max_date,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    sum(rows) AS rows
FROM system.parts AS p
INNER JOIN
(
    SELECT
        Tables.database AS database,
        Tables.table AS table
    FROM system.graphite_retentions
    ARRAY JOIN Tables
    GROUP BY
        database,
        table
) AS g USING (database, table)
GROUP BY
    database,
    table,
    partition,
    active
ORDER BY
    database,
    table,
    partition,
    active ASC

Don't bother yourself with active = 0, or read, what they are here

First query didn't return anything.

Well, if you've launched OPTIMIZE ... FINAL today, the output would be empty, for sure :-j

agreed. Received this output from query today which is mentioned in repo. From this query didn't I have received any output yesterday as I ran OPTIMIZE yesterday.

shard_01.graphite       20210224        20210224        0       7       2021-02-25 00:00:00     2021-02-25 00:00:00     2021-02-25 06:01:10

Time of running above query : Thu 25 Feb 2021 07:00:03 AM UTC

Ran the query you mentioned in your comment this is the result I received.

shard_01        graphite        7       0       20210218        2021-02-18      2021-02-18      1.18 GiB        815148257
shard_01        graphite        1       1       20210218        2021-02-18      2021-02-18      172.91 MiB      116449751
shard_01        graphite        6       0       20210219        2021-02-19      2021-02-19      1.12 GiB        845000622
shard_01        graphite        1       1       20210219        2021-02-19      2021-02-19      190.81 MiB      140833437
shard_01        graphite        1       0       20210220        2021-02-20      2021-02-20      175.23 MiB      140102519
shard_01        graphite        1       1       20210220        2021-02-20      2021-02-20      175.23 MiB      140102519
shard_01        graphite        1       0       20210221        2021-02-21      2021-02-21      154.96 MiB      137257847
shard_01        graphite        1       1       20210221        2021-02-21      2021-02-21      154.96 MiB      137257847
shard_01        graphite        1       0       20210222        2021-02-22      2021-02-22      189.77 MiB      142769424
shard_01        graphite        1       1       20210222        2021-02-22      2021-02-22      189.77 MiB      142769424
shard_01        graphite        1       0       20210223        2021-02-23      2021-02-23      345.09 MiB      243028585
shard_01        graphite        1       1       20210223        2021-02-23      2021-02-23      345.09 MiB      243028585
shard_01        graphite        7       0       20210224        2021-02-24      2021-02-24      293.30 MiB      222635457
shard_01        graphite        7       1       20210224        2021-02-24      2021-02-24      293.30 MiB      222635457
shard_01        graphite        1218    0       20210225        2021-02-25      2021-02-25      406.36 MiB      143229941
shard_01        graphite        5       1       20210225        2021-02-25      2021-02-25      54.82 MiB       41964685

2021-02-25 06:01:10 - so the latest write to partition 20210224 is done one hour before the query.

Check, if you have historical writes. They bump min(p.modification_time) AS modified_at column and delay the partition optimization.

Even more, if you'd set --optimize-interval too low, graphite-ch-optimizer (I'll use GCHO later) may optimize the same partition multiple times. OPTIMIZE is a heavy operation. Trust me, you don't want it. That's why the default optimize-interval value is relatively high.

Maybe, it worth describing, why you don't actually carry about the late optimizations. graphite-clickhouse knows the rollup scheme when you use rollup-conf = 'auto' and aggregates data. To understand the mechanism, read its wiki.

Hey, any updates?

My apologies for late response as I was distracted with other work. I am yet to go through the documentation which you shared to understand how it works underhood. atm, we are good to close this issue. I am very thankful for your help.