go-graphite/graphite-clickhouse

seriesByTag invalid regular expression parsing

lexx-bright opened this issue · 5 comments

For seriesByTag("name=up", "job=~us") will be used match(x, '^job=.*us') matcher which contradicts with
https://github.com/graphite-project/graphite-web/blob/e058266f8afc293250ee32fd30e3bce4b7ab3579/webapp/graphite/render/functions.py#L5735

INFO [render] query {"request_id": "8bc07b855d62a36520b072379955ff25", "carbonapi_uuid": "6c6fc662-5c65-4ab2-8d19-de96e5be708e", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged  WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*us'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_rows": "0", "result_bytes": "0", "read_rows": "8400", "read_bytes": "3440365", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8400", "query_id": "8bc07b855d62a36520b072379955ff25::abb506b6cbb48744", "time": 0.027178179}

For seriesByTag("name=up", "job=~.*us|job") will be used match(x, '^job=.*.*us|job') matcher which is equivalent of match(x, '^job=.*.*us') OR match(x, 'job'), but expected match(x, '^job=.*.*us') OR x = 'job=job'

INFO [render] query {"request_id": "e392fbdf4112988ab2d4706e667e1efc", "carbonapi_uuid": "d911c941-2874-4acb-a245-d0209114bbb1", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged  WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*.*us|job'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_bytes": "0", "read_rows": "8340", "read_bytes": "3438256", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8340", "result_rows": "0", "query_id": "e392fbdf4112988ab2d4706e667e1efc::2af9ea555b890411", "time": 0.016665809}

Inconsistent behaviour.
For seriesByTag("name=up", "job=~^clickhous") will be used x='job=clickhous' matcher

INFO [render] query {"request_id": "79a424939f95bf061fd863d4a36baf79", "carbonapi_uuid": "5d324f43-3b43-4fdc-9e0a-f055f0c72558", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged  WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x='job=clickhous', Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "result_rows": "0", "result_bytes": "0", "read_rows": "8222", "read_bytes": "3434759", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8222", "query_id": "79a424939f95bf061fd863d4a36baf79::e60d4381a7c098c3", "time": 0.023963179}

but for seriesByTag("name=up", "job=~^.*clickhous") match(x, '^job=.*clickhous')

INFO [render] query {"request_id": "15ea95fffc208a157e7645eb83d8146a", "carbonapi_uuid": "3526ad2b-68d1-4243-8cb5-e32ca6780223", "query": "SELECT Path FROM db001_monitoring_stats.graphite_tagged  WHERE ((Tag1='__name__=up') AND (arrayExists((x) -> x LIKE 'job=%' AND match(x, '^job=.*clickhous'), Tags))) AND (Date >='2023-09-11') GROUP BY Path FORMAT TabSeparatedRaw", "read_rows": "8222", "read_bytes": "3434759", "written_rows": "0", "written_bytes": "0", "total_rows_to_read": "8222", "result_rows": "0", "result_bytes": "0", "query_id": "15ea95fffc208a157e7645eb83d8146a::dd7ca3e39612858f", "time": 0.019437923}

So, the series with label job=clickhouse won't be returned in the first case, but will be in the second.

Fixed in master.

@msaf1980 please, reopen.

# echo "test;env=prod 0 $(date +%s)" | nc 127.0.0.1 2003
# echo "test;env=dr 0 $(date +%s)" | nc 127.0.0.1 2003
# curl -s '127.0.0.1:19010/render?target=seriesByTag("name=test")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[
  {
    "target": "test;env=dr",
    "datapoints": [
      [
        0,
        1700396220
      ]
    ],
    "tags": {
      "env": "dr",
      "name": "test"
    }
  },
  {
    "target": "test;env=prod",
    "datapoints": [
      [
        0,
        1700396220
      ]
    ],
    "tags": {
      "env": "prod",
      "name": "test"
    }
  }
]
# curl -s '127.0.0.1:19010/render?target=seriesByTag("name=test","env!=~stage|env")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[]

@lexx-bright Can you test against current master ?

@msaf1980, thanks for the fix but regular expression is still not anchored

curl -s '127.0.0.1:19020/render?target=seriesByTag("name=test")&format=json&f
rom=now-1h&until=now&maxDataPoints=1' | jq .
[
  {
    "target": "test;env=dr",
    "datapoints": [
      [
        0,
        1715839140
      ]
    ],
    "tags": {
      "env": "dr",
      "name": "test"
    }
  },
  {
    "target": "test;env=prod",
    "datapoints": [
      [
        0,
        1715839140
      ]
    ],
    "tags": {
      "env": "prod",
      "name": "test"
    }
  }
]

curl -s '127.0.0.1:19020/render?target=seriesByTag("name=test","env=~r")&format=json&from=now-1h&until=now&maxDataPoints=1' | jq .
[
  {
    "target": "test;env=dr",
    "datapoints": [
      [
        0,
        1715839380
      ]
    ],
    "tags": {
      "env": "dr",
      "name": "test"
    }
  },
  {
    "target": "test;env=prod",
    "datapoints": [
      [
        0,
        1715839380
      ]
    ],
    "tags": {
      "env": "prod",
      "name": "test"
    }
  }
]

@lexx-bright Hi, regular expressions in graphite-clickhouse are intentionally not anchored at the beginning . If you really need this behavior you can open a PR that implements it under a feature flag.