Trino's S3 client no longer has a recognizable HTTP user agent header value
metadaddy opened this issue · 3 comments
The now-removed legacy S3 client in Trino appended the suffix , Trino
to the User-Agent
HTTP header value when accessing S3-compatible providers via the AWS SDK for Java. This was a great aid to observability, allowing S3 providers to reliably identify traffic from Trino.
The new native S3 file system does not add any distinguishing identifier to the User-Agent
HTTP header value. A typical header value looks like:
aws-sdk-java/2.29.24 md/io#sync md/http#Apache ua/2.1 os/Mac_OS_X#14.7.1 lang/java#23.0.1 md/OpenJDK_64-Bit_Server_VM#23.0.1+11 md/vendor#Eclipse_Adoptium md/en_US md/kotlin/2.1.0-release-394 cfg/auth-source#stat m/D,N,N
This contains a lot of information in a more-or-less digestible form, but no clue that the request comes from Trino.
The placement of Trino
as a suffix was always anomalous. Most vendors that identify their client in the User-Agent
HTTP header value do so with a prefix. We at Backblaze most commonly see user-agent prefixes of the form:
APN/1.0 {COMPANY}/1.0 {PRODUCT}/{VERSION}, {DEFAULT_AWS_SDK_USER_AGENT_STRING}
Where:
APN
= AWS Partner NetworkCOMPANY
= company namePRODUCT
= product nameVERSION
= product version identifierDEFAULT_AWS_SDK_USER_AGENT_STRING
- the defaultUser-Agent
HTTP header value used by the AWS SDK
It's hard to find a canonical source for this format - the closest I've found is at aws/aws-sdk-go-v2#1432 (comment)
A concrete example from our logs is:
APN/1.0 HYCU/1.0 HYCU4EC/5.0.0, aws-sdk-java/1.12.440 Linux/6.2.8-200.hycu20231010.fc37.x86_64 OpenJDK_64-Bit_Server_VM/25.412-b08 java/1.8.0_412 scala/2.10.4 kotlin/1.2.71 vendor/Red_Hat,_Inc. cfg/retry-mode/legacy
Here, the company identifier is HYCU
(a backup product vendor), the product identifier is HYCU4EC
(HYCU for EC2), and the product version is 5.0.0
.
A suitable user-agent prefix for Trino might use Trino
as the company name, and TrinoServer
as the product name, yielding a user agent value such as:
APN/1.0 Trino/1.0 TrinoServer/466, aws-sdk-java/2.29.24 md/io#sync md/http#Apache ua/2.1 os/Mac_OS_X#14.7.1 lang/java#23.0.1 md/OpenJDK_64-Bit_Server_VM#23.0.1+11 md/vendor#Eclipse_Adoptium md/en_US md/kotlin/2.1.0-release-394 cfg/auth-source#stat m/D,N,N
The AWS SDK for Java 2.0 recommends setting the user agent prefix via the construction:
overrideConfig.advancedOption(
SdkAdvancedClientOption.USER_AGENT_PREFIX, ...)
I'm happy to submit a PR with this addition to S3FileSystemLoader.createOverrideConfiguration()
.
I've been staring at the source code for about two hours trying to figure out how an S3FileSystemLoader
can discover the version of the server that it's running on. The key seems to be to somehow allow an S3FileSystemLoader
to get an instance of NodeManager
, then call getCurrentNode().getVersion()
. FileSystemModule
has a nodeManager
, but I don't know Guice well enough to figure out how it can give that to S3FileSystemLoader
.
see #24361 which provides config for this with default value Trino
(but no version).
#24427 - ugly but shows the general changes you might need to make. I did it just for the Azure FS (but the actual FS doesn't really matter here).
Also note that AWS SDK itself suggests using "application id" instead of separate UA-prefix and suffix fields - that's legacy from SDK v1. And all other cloud SDKs use the "app id" idea too.