ClickHouse/clickhouse-java

Make User-Agent to be filled correctly

chernser opened this issue · 4 comments

Describe the bug

ClickHouse Java Client is used by many connectors and products providing integration with ClickHouse. When such client within application is contacting a server such client send User-Agent header where very important information is packed. Part of this information is used for analytics and making decisions.
Current implementation is may be not strict enough in this part because we do see strings like:
metabase/1.3.3/unknown (Linux/6.1.0-12-amd64; OpenJDK 64-Bit Server VM/Temurin-11.0.22+7; HttpURLConnection; rv:unknown)

And there is no 100% guarantee that some integration used expected JDBC driver. What is more difficult is to match integration version with expected java client version and all its core components (for example, was Apache or HttpURLConnection used).

Expected behaviour

Whatever integration is used, what ever User-Agent header is passed there should be information from java client.
Look https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent for reference.

According to the standard https://www.rfc-editor.org/rfc/rfc9110#field.user-agent format of the header is:
User-Agent = product *( RWS ( product / comment ) ) where product:

  product         = token ["/" product-version]
  product-version = token

Today's value like this:
ClickHouse-JavaClient/unknown (Windows 11/10.0; OpenJDK 64-Bit Server VM/JBR-17.0.9+8-1166.2-nomod; Apache-HttpClient/5.2.1; rv:unknown)

This is long, but fine. However we need to be sure that:

  • If someone sets a custom User-Agent header value then java client identifier still added to the User-Agent header
  • If someone sets a client name then java client identifier still added to the User-Agent header

JavaClient should be always present. The version unknown has no value for us.

Code references:
com.clickhouse.client.http.ClickHouseHttpConnection#getUserAgent - it overrides User-Agent if com.clickhouse.client.config.ClickHouseClientOption#CLIENT_NAME is set. This is incorrect. There is an com.clickhouse.client.http.config.ClickHouseHttpOption#SEND_HTTP_CLIENT_ID to set Referer header that is correct to use to identify particular client. Again, User-Agent tell class of the software used for communication but doesn't tell exact application instance.

com.clickhouse.client.http.ClickHouseHttpConnection#createDefaultHeaders - is another place where client name may override the User-Agent header. This is not correct.

com.clickhouse.client.config.ClickHouseClientOption#buildUserAgent - has another twist because it takes PRODUCT_NAME from Envs (PRODUCT_NAME.getEffectiveDefaultValue()).

Optional tasks:

  • Remove rv: part because it duplicates product version from the main part.
  • Remove os part because it looks useless for Java people
  • Shorten a java info because only version make sense for statistics.

@chernser let's also add metadata for v2 API

Updated format: MyCustomClient ClickHouse-JavaClient/0.6.3 (OpenJDK 64-Bit Server VM/Corretto-17.0.6.10.1; HttpURLConnection)