seriesByTag invalid regular expression parsing
lexx-bright opened this issue · 5 comments
For seriesByTag("name=up", "job=~us") will be used match(x, '^job=.*us')
matcher which contradicts with
https://github.com/graphite-project/graphite-web/blob/e058266f8afc293250ee32fd30e3bce4b7ab3579/webapp/graphite/render/functions.py#L5735
INFO [render] query {"request_id": "8bc07b855d62a36520b072379955ff25", "carbonapi_uuid": "6c6fc662-5c65-4ab2-8d19-de96e5be708e", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*us'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_rows": "0", "result_bytes": "0", "read_rows": "8400", "read_bytes": "3440365", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8400", "query_id": "8bc07b855d62a36520b072379955ff25::abb506b6cbb48744", "time": 0.027178179}
For seriesByTag("name=up", "job=~.*us|job") will be used match(x, '^job=.*.*us|job')
matcher which is equivalent of match(x, '^job=.*.*us') OR match(x, 'job')
, but expected match(x, '^job=.*.*us') OR x = 'job=job'
INFO [render] query {"request_id": "e392fbdf4112988ab2d4706e667e1efc", "carbonapi_uuid": "d911c941-2874-4acb-a245-d0209114bbb1", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*.*us|job'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_bytes": "0", "read_rows": "8340", "read_bytes": "3438256", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8340", "result_rows": "0", "query_id": "e392fbdf4112988ab2d4706e667e1efc::2af9ea555b890411", "time": 0.016665809}
Inconsistent behaviour.
For seriesByTag("name=up", "job=~^clickhous") will be used x='job=clickhous'
matcher
INFO [render] query {"request_id": "79a424939f95bf061fd863d4a36baf79", "carbonapi_uuid": "5d324f43-3b43-4fdc-9e0a-f055f0c72558", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x='job=clickhous', Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_rows": "0", "result_bytes": "0", "read_rows": "8222", "read_bytes": "3434759", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8222", "query_id": "79a424939f95bf061fd863d4a36baf79::e60d4381a7c098c3", "time": 0.023963179}
but for seriesByTag("name=up", "job=~^.*clickhous") match(x, '^job=.*clickhous')
INFO [render] query {"request_id": "15ea95fffc208a157e7645eb83d8146a", "carbonapi_uuid": "3526ad2b-68d1-4243-8cb5-e32ca6780223", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*clickhous'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "read_rows": "8222", "read_bytes": "3434759", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8222", "result_rows": "0", "result_bytes": "0", "query_id": "15ea95fffc208a157e7645eb83d8146a::dd7ca3e39612858f", "time": 0.019437923}
So, the series with label job=clickhouse won't be returned in the first case, but will be in the second.
Fixed in master.
@msaf1980 please, reopen.
# echo "test;env=prod 0 $(date +%s)" | nc 127.0.0.1 2003
# echo "test;env=dr 0 $(date +%s)" | nc 127.0.0.1 2003
# curl -s '127.0.0.1:19010/render?target=seriesByTag("name=test")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[
{
"target": "test;env=dr",
"datapoints": [
[
0,
1700396220
]
],
"tags": {
"env": "dr",
"name": "test"
}
},
{
"target": "test;env=prod",
"datapoints": [
[
0,
1700396220
]
],
"tags": {
"env": "prod",
"name": "test"
}
}
]
# curl -s '127.0.0.1:19010/render?target=seriesByTag("name=test","env!=~stage|env")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[]
@lexx-bright Can you test against current master ?
@msaf1980, thanks for the fix but regular expression is still not anchored
curl -s '127.0.0.1:19020/render?target=seriesByTag("name=test")&format=json&f
rom=now-1h&until=now&maxDataPoints=1' | jq .
[
{
"target": "test;env=dr",
"datapoints": [
[
0,
1715839140
]
],
"tags": {
"env": "dr",
"name": "test"
}
},
{
"target": "test;env=prod",
"datapoints": [
[
0,
1715839140
]
],
"tags": {
"env": "prod",
"name": "test"
}
}
]
curl -s '127.0.0.1:19020/render?target=seriesByTag("name=test","env=~r")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[
{
"target": "test;env=dr",
"datapoints": [
[
0,
1715839380
]
],
"tags": {
"env": "dr",
"name": "test"
}
},
{
"target": "test;env=prod",
"datapoints": [
[
0,
1715839380
]
],
"tags": {
"env": "prod",
"name": "test"
}
}
]
@lexx-bright Hi, regular expressions in graphite-clickhouse are intentionally not anchored at the beginning . If you really need this behavior you can open a PR that implements it under a feature flag.