ClickHouse/clickhouse-js

Query metrics or query id

ultram4rine opened this issue · 9 comments

Hi everyone! I'm trying to implement showing query metrics for ultram4rine/sqltools-clickhouse-driver#281 and since I'm rewriting my driver using this official lib I have a question, is it possible to get query_id to use it in statement like:

SELECT query_duration_ms,
  read_rows,
  read_bytes
FROM system.query_log
WHERE query_id = 'abfba2b6-9488-4c45-9186-d86b4c0d8bba'
  AND type = 2;

If no, is it possible to implement getting query id or metrics in this lib?

if you are using JSON format for queries (non-streamable), it should be possible to obtain the metadata like in this example:

ts-node --transpile-only --project tsconfig.dev.json examples/select_json_with_metadata.ts 

{
  meta: [ { name: 'number', type: 'UInt64' } ],
  data: [ { number: '0' }, { number: '1' } ],
  rows: 2,
  rows_before_limit_at_least: 2,
  statistics: { elapsed: 0.001263917, rows_read: 2, bytes_read: 16 }
}

As for the query_id, I will look into what's possible

@ultram4rine I would like to know if something like this will help.

query_id will be obtainable from query, insert and exec

it('can use query_id to fetch query_log table', async () => {
const query = 'SELECT * FROM system.numbers LIMIT 144'
const { query_id } = await client.query({
query,
format: 'JSON',
})
// query_log is flushed every ~1000 milliseconds
// so this might fail a couple of times
await retryOnFailure(
async () => {
const logResultSet = await client.query({
query: `
SELECT * FROM system.query_log
WHERE query_id = {query_id: String}
`,
query_params: {
query_id,
},
format: 'JSONEachRow',
})
const formattedQuery =
'SELECT * FROM system.numbers LIMIT 144 \nFORMAT JSON'
expect(await logResultSet.json()).toEqual([
expect.objectContaining({
type: 'QueryStart',
query: formattedQuery,
initial_query_id: query_id,
query_duration_ms: expect.any(String),
read_rows: expect.any(String),
read_bytes: expect.any(String),
}),
expect.objectContaining({
type: 'QueryFinish',
query: formattedQuery,
initial_query_id: query_id,
query_duration_ms: expect.any(String),
read_rows: expect.any(String),
read_bytes: expect.any(String),
}),
])
},
{
maxAttempts: 20,
waitBetweenAttemptsMs: 100,
}
)
})

It is not yet merged, see #132, but we can add it in 0.0.12

@slvrtrn while query_id is pretty useful, using system.query_log is a bit awkward. On the other hand, with JSON format, I get statistics just like in console or in web interface + meta columns, but I can't use stream.
I wonder, if it is possible to add this statistics and meta to stream, like it was in @alpa/node-clickhouse lib?

@ultram4rine like it is done here? https://github.com/apla/node-clickhouse#selecting-large-dataset

stream.on('end', () => {
  console.log(
    rows.length,
    stream.supplemental.rows,
    stream.supplemental.rows_before_limit_at_least, // how many rows in result are set without windowing
  )
})

@slvrtrn yes, but maybe stats like

stream.on('end', () => {
  console.log(
    stream.statistics.elapsed_time,
    stream.statistics.read_rows,
    stream.statistics.read_bytes,
  )
})

@ultram4rine I'll check if it's possible with HTTP protocol and streams

@ultram4rine our best bet here is to decode the X-ClickHouse-Progress header (see the entry about it in https://clickhouse.com/docs/en/interfaces/http#default-database) plus add some internal stopwatch to mimic the elapsed_time behavior as it is not supported there and emit this info in the stream;
however, I am not sure how useful it will be.

r best bet here is to decode the X-ClickHouse-Progress header (see the entry about it in https://clickhouse.com/docs/en/interfaces/http#default-database)

If we want to use X-ClickHouse-Progress header, we will have to combine it with wait_end_of_query=1, which defeats the purpose of response streaming. see https://clickhouse.com/docs/en/interfaces/http/#response-buffering

stream.statistics.elapsed_time,
stream.statistics.read_rows,
stream.statistics.read_bytes,

I'd suggest waiting until the library implements the native protocol to receive that information from ClickHouse server. I'd prefer not to calculate these values manually. For example, there might be a discrepancy in the read_bytes since values have different representations in C++ and nodejs runtime.

s add some internal stopwatch to mimic the elapsed_time

The value we calculate on the client side will include network overhead.
The proposed logic can be implemented in the user app:

console.time('label')
await client.query(...)
console.timeEnd('label')

@ultram4rine query_id is now obtainable as of 0.0.12.