w3c/server-timing

add start time to distinguish between cached and active entries

Opened this issue ยท 6 comments

The UA may receive Server-Timing entries that were cached (eg: by a CDN), provide ancestral detail but may not be relevant to visualizing the current timeline from the UA's perspective. To allow the UA to distinguish between recent and historical Server-Timing values, servers should add a start= timestamp. This way the UA can distinguish between recent Server-Timing entries and past entries when visualizing the timeline.

example

For example, the following request:
curl -D - -o /dev/null -s https://colinbendell.cloudinary.com/image/upload/c_crop,f_auto,g_auto,h_350,w_400/v2/Wizard-Clap-by-Markus-Magnusson.gif

Consider the two responses:

date: Wed, 13 Dec 2017 04:04:39 GMT
server: cloudinary
content-type: image/gif
content-length: 580970

server-timing: cloudinary;dur=55;start=2017-12-13T04:04:39.239Z;total=55
server-timing: akam;dur=59;start=2017-12-13T04:04:39Z;total=114;desc="miss"
server-timing: fastly;dur=48;start=2017-12-13T04:04:38.677Z;total=162;desc="miss"

Here we have UA -> fastly -> akamai -> Origin and all layers are active in this response which populates the CDN layers with a cach miss.

A few minutes later we re-request the resource and we get:

date: Wed, 13 Dec 2017 06:25:29 GMT
server: cloudinary
content-type: image/gif
content-length: 580970

server-timing: cloudinary;dur=55;start=2017-12-13T04:04:39.239Z;total=55
server-timing: akam;dur=1;start=2017-12-13T06:25:29Z;total=1;desc="hit"
server-timing: akam;dur=13;start=2017-12-13T06:25:29Z;total=14;desc="miss"
server-timing: fastly;dur=36;start=2017-12-13T06:25:28.991Z;total=50;desc="miss"

now we have UA -> CDN -> CDN -> CDN.

In the second example, the UA had a total TTFB of 116ms.

Using Server-Timing the UA is now able to distinguish between network time and server time. However, if the UA simply summed all the Server-Timing dur= values it would yield a total of 105ms as server think time. In reality 55ms of that 105ms total should not apply to visualizing this request.

To solve this, the UA should look at the start time and ignore entries that happen before the requested start time.

Can we trust the start time? What about time drift?

In this scenario, we can trust the start time because we are only worried about approximation to the request time to the UA as a mechanism to ignore older entries. In reality, servers can often be out of sync by 10s of milliseconds (maybe as high as a few hundreds). Modern infrastructure makes this less pervasive since encryption often requires a higher level of time synchronization. Regardless, this is intended to be directional and allow the UA to determine order and filter out 'past' entries, vs 'current' entries.

Why not mark the 'last' server-timing entry as last?

I'm assuming the principle that each layer in the chain should have isolated awareness. that is, a request should only be responsible for adding itself and a) should never touch another server-timing entry and b) should never need to be aware of other server-timing entries that may or may not be present. In short, each layer should be ignorant of any other timing metrics in the eco system and therefore oblivious if it is the last or first hop in the chain.

Shouldn't only the current server-timing entries be sent?

Just as the principle of isolated awareness, each hop should not be required to expunge past entries (though this may be desirable for business reasons). The ancestral details that Server-Timing provide are insightful for other purposes - debugging, triaging, utilization analytics, etc. For example, the Server-Timing values in the description may describe the A:B state from the origin and useful when combining with an analytics engine. The preservation of historical server-timing should be a business decision and not the responsibility of the spec.

[edit: remove extraneous desc details]
[edit: changed from epoch to ISO8601. In our implementation, we found that it was mostly humans looking at this value and humans can't read epoch very well. switching to ios8601 made for this to be more practically useful in production at the expense of a few extra characters]

I would use at keyword for that and leave start to denote the "start time" within processed event itself. That would allow server-timing to be used to display real trace of the events that happened on the server.

Are you suggesting this would allow you to group at on a process layer with different start for each inner-process? Can you give a proposed example?

@colinbendell sorry for late response but the idea is to use start keyword as a millisecond/nanosecond since imaginary start of the request handling by the server, so for example with headers like:

server-timing: request;dur=40;start=0
server-timing: db;dur=10;start=10
server-timing: json;dur=5;start=35

Would result in diagram like:

|--------------------------------------| request
          |--------| db
                                   |---| json

And start would be by default set to 0 (however no timing would be required to have start=0, so in theory all can start after starting handling request).

EDIT:

However I now see that this is just different approach to #44, but instead of total which is "end" time it uses start as "begin" time.

noamr commented

Since server timing exposes timings that come from different clocks, which makes not 100% comparable, and OTOH considering that this data could be useful for DevTools et al, I believe perhaps a reasonable way to look at it is that one of the attributes would provide a start time approximation. I believe this would also solve #44.

Developers that want to look at it or at waterfalls created based on this data should be aware of the possible clock differences and that any waterfall is an approximation, though in some cases still useful.

Perhaps approx_start=${ISO-8601}?

/cc @yoavweiss @paulirish

Lack of a standard for "start time" means that different vendors are implementing their own incompatible solutions:

Fastly is going with:

Server-Timing: time-start-msec;dur=1544707555517,time-elapsed;dur=81

Cloudinary is going with:

server-timing: cloudinary;dur=94;start=1516653847.986;total=94

One of the biggest problem I see there is that there is no API for fetching extra metadata from the entry (see #72). This would allow digesters to circumvent this problem in more uniform way.