w3c/network-error-logging

Signed Exchange Reporting

horo-t opened this issue · 21 comments

I want to extend Network Error Logging to support Signed Exchange.

Signed Exchange feature enables content publishers to sign their contents using their own private keys. User Agents (UAs) can trust the signed contents as if the contents are served from the publisher’s origins even if they are served from other distributors’ origins. Even if there is no network errors, UA may fail to load the signed content (example: the signature of the content has expired). This case is not covered by the Network Error Logging feature now. Both publishers and distributors can’t recognize the errors in the user’s environment. I want to extend Network Error Logging feature to enable both publishers and distributors to investigate the signed exchange loading errors such as certificate verification errors.

Distributor side reporting

A publisher (publisher.example) signed the article (https://publisher.example/article.html) as article.html.sxg. A distributor (distributor.example) is distributing the content at https://distributor.example/publisher.example/article.html.sxg and the certificate of publisher.example at https://distributor.example/publisher.example/cert.

If the distributor wants to investigate the signed exchange logs, the distributor sets the Report-To and NEL header in the HTTP response. This is same as the existing Network Error Logging feature.

Report-To: {"group": "sxg-errors",
            "max_age": 10886400,
            "endpoints": [{ "url": "https://report.distributor.example/" }] }
NEL: {"report_to": "sxg-errors", "max_age": 2592000}

Once UA receives the Report-To and NEL header, when UA failed to load (prefetch or navigation) the signed exchange content in the origin (https://distributor.example) because the signature has expired, UA should send the report to the end point https://report.distributor.example/.

{
  "type": "signed-exchange",
  "age": 1,
  "url": "https://distributor.example/publisher.example/article.html.sxg",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) ...",
  "body": {
    "type": "failed",
    "outer_url": "https://distributor.example/publisher.example/article.html.sxg",
    "inner_url": "https://publisher.example/article.html",
    "cert_url": "https://distributor.example/publisher.example/cert",
    "sampling_fraction": 1
  }
}

Publisher side reporting

If the publisher wants to investigate the signed exchange logs, the publisher sets the Report-To and NEL in the signed response header of the signed exchange.

Report-To: {"group": "sxg-errors",
            "max_age": 10886400,
            "endpoints": [{ "url": "https://report.publisher.example/" }] }
NEL: {"report_to": "sxg-errors", "max_age": 2592000}

Same as distributor side reporting, UA should send the report to the end point https://report.publisher.example/ when the UA failed to navigate to the signed exchange content.

{
  "type": "signed-exchange",
  "age": 1,
  "url": "https://publisher.example/article.html",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) ...",
  "body": {
    "type": "failed",
    "outer_url": "https://distributor.example/publisher.example/article.html.sxg",
    "inner_url": "https://publisher.example/article.html",
    "cert_url": "https://distributor.example/publisher.example/cert",
    "sampling_fraction": 1
  }
}

UAs must not send the report to the publisher when the UA loaded the signed exchange for prefetching, because prefetching signed exchange must be done in a privacy-preserving manner.

Error types

UA will expose the following error types.

The following detailed error types should be used only when the reporting origin (the distributor or the publisher) is same as the origin of cert-url of the signed exchange. See Hiding detailed error type section.

  • non_secure_origin
  • parse_error
  • unsupported_version
  • network_error
  • cert_fetch_error
  • cert_parse_error
  • signature_verification_error
  • cert_verification_error
  • ct_verification_error
  • ocsp_error
  • cert_requirements_not_met
  • mi_error

Hiding detailed error type

If the reporting origin (the distributor or the publisher) is different from the origin of cert url, UA must send only ok or failed. This is intended to avoid the leaking of cross-origin information in cert_url.
For example an evil attacker can do the port-scanning by checking whether the error type is cert_fetch_error or cert_parse_error if UA sends the detailed error types. Example:

@dcreager would love to hear your thoughts on this one. Arguably, it's pushing on the NEL boundaries, but I think the use case is critical for deploying SxG in the wild and conceptually fits well into NEL — it is addressing load failures. WDYT?

/cc @yoavweiss @toddreifsteck

Broadly speaking I think this is a good addition! I'll try to take a closer look tonight or tomorrow, my very rough initial thoughts are all along the lines of "can we integrate this into NEL more tightly?". For instance, do we need a new report type? Or can we just create network-error reports and just add the outer_url and cert_url fields to the existing body? If signed exchanges itself carried those URLs in response headers in distributor's response, then we could even use the new capability from #96 to include this without having to add custom fields to body.

We can also maybe reuse some of the report downgrading logic to handle the "hiding detailed error type" use case.

Thoughts?

Hmm, interesting. So we could..

  • Define new group of sgx errors ~sxg.<optional-subgroup>.<name>
  • Potentially, a set of optional body fields: outer, inner, certurl
    • (note: not sure on whether and if it makes sense to communicate those as headers)
  • Aligning with downgrading makes sense, I think current logic in NEL is addressing similar leak?

@horo-t thoughts?

Stop using signed-exchange type and introducing sxg. prefix to the error types sounds good to me.

outer_url and inner_url and cert_url are different from headers.
So I think we need the new optional body fields for them.

Does the downgrading logic mean this section "Origins with multiple IP addresses"?
https://w3c.github.io/network-error-logging/#origins-with-multiple-ip-addresses
Ah, yes. We should check the IP address rather than checking the origin.

Signed Exchange report body

If we extend the existing body, the signed exchange report would be like this:

To the distributor's end point.

{
  "type": "network-error",
  "url": "https://distributor.example/publisher.example/article.html.sxg",
  "age": 165,
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) ...",
  "body": {
    "referrer": "https://aggregator.example/article.html",
    "sampling_fraction": 1,
    "server_ip": "123.122.121.120",  // The IP address of distributor.example.
    "protocol": "http/1.1",
    "method": "GET",
    "status_code": 200,
    "elapsed_time": 1234,
    "phase": "sxg",
    "type": "sxg.failed",

    "outer_url": "https://distributor.example/publisher.example/article.html.sxg",
    "inner_url": "https://publisher.example/article.html",
    "cert_url": "https://distributor.example/publisher.example/cert"
  }
}

To the publisher's end point.

{
  "type": "network-error",
  "url": "https://publisher.example/article.html",
  "age": 234,
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) ...",
  "body": {
    "referrer": "https://aggregator.example/article.html",
    "sampling_fraction": 1,
    "server_ip": "123.122.121.120",  // The IP address of distributor.example.
    "protocol": "http/1.1",
    "method": "GET",
    "status_code": 200,
    "elapsed_time": 1234,
    "phase": "sxg",
    "type": "sxg.failed",

    "outer_url": "https://distributor.example/publisher.example/article.html.sxg",
    "inner_url": "https://publisher.example/article.html",
    "cert_url": "https://distributor.example/publisher.example/cert"
  }
}

I'm not 100% sure whether it is ok or not to send the IP address of distributor.example to the publisher's endpoint.

Hiding detailed error type

I think this line

If the reporting origin (the distributor or the publisher) is different from the origin of cert url, UA must send only ok or failed.

should be changed to this:

If the IP addresses from where the UA recieved the certificate is different from the IP address of the NEL policy, the UA must send only ok or failed.

@igrigorik , @dcreager Does this match your idea?

@horo-t overall, yeah that looks reasonable. With respect to #100 ...

@dcreager what's your take on allowing report specific fields in the body, like what @horo-t suggesting in #100? That formulation opens up body to carry arbitrary sets of fields, which has me slightly worried since that means there is no longer a well specified/known list of fields and upstream specs can (will) start pushing arbitrary data / bloat the reports.

  • If we want to allow report-specific fields, should we quarantine them under a top-level key?
    • At a minimum, we need to ensure that we're not overriding existing keys.
  • Ideally, the fields themselves would be known by NEL, instead of free for all?

That formulation opens up body to carry arbitrary sets of fields, which has me slightly worried since that means there is no longer a well specified/known list of fields and upstream specs can (will) start pushing arbitrary data / bloat the reports.

Yeah I agree with this concern; I think @horo-t's latest draft on #100 (where it's a predefined schema) is a better stab at this.

Although now that I think of it a bit more, maybe I don't agree as much... 😉 I think the current #100 muddies the waters a bit: I like the IDL definition of the new fields, but that new set of fields is specific to Signed Exchange responses, and so you could also argue that the IDL definition really belongs over in the SXG spec.

I take your point about wanting to avoid a free-for-all with new unspecified fields in NEL reports. But for me, that concern is more about preventing user-agent implementors from adding unspecified fields. I'm less worried about downstream Web specs adding new fields, because that will at least go through some vetting process that allows us to raise concerns if we think they're getting out of hand. The key for me is that we want to disallow unspecified fields, and not that we want to require all fields to be specified specifically in the NEL spec.

So what about the following:

  1. Update NEL so that "Generate a network report" returns the NEL report, instead of queuing it for delivery, and add a new algorithm "Deliver a network report" that does the delivery.
  2. Move the definition of AdditionalReportBody over into the SXG spec, and update WICG/webpackage#374 to:
    • Call "Generate" to get the NEL report.
    • Add a new field to the report body called sxg (or something like that) whose value is an instance of AdditionalReportBody.
    • Call "Deliver" to deliver the report.

(Also I apologize if that seems to contradict my earlier comment! I'm not trying to go back-and-forth, but I do think my more recent suggestion could be a better separation of concerns.)

I updated #100 to 'Introduce a new algorithm "Deliver a network report"', and updated WICG/webpackage#374 to use the algorithm.
How about this change?

The key for me is that we want to disallow unspecified fields, and not that we want to require all fields to be specified specifically in the NEL spec.

@dcreager yep, makes sense. The mechanism you proposed makes sense to me.

My only other constraint (or, "desire" is probably more apt) is that the payload is laid out in a predictable way. For example, if I'm implementing a collector or aggregator, I'd like to have a predictable path for baseline processing of any report, regardless of its type...

  • The top level fields of the report are defined by NEL and I can rely on them being there
  • The report specific metadata is contained in a predictably placed namespace (body)

I think both of the above are maintained in the proposed setup, and body contents are defined in SxG spec, which keeps all the definitions in correct places.

@horo-t it looks like WICG/webpackage@af43527 landed support for reporting to distributor only, what about the publisher use case that you highlighted in your proposal? Are you planning to land that as well?

Currently I don't have a plan to land the publisher side reporting in the short term.

It is because there could be several privacy and security concerns in publishers side.
It may reveal the URL of signed exchange to the publishers, which is not available now.

Hmm.. I'd love to understand the potential concerns over surfacing the URL of the signed exchange. Any existing docs or threads that you can point me to, or maybe you have a short tl;dr top of mind?

Sorry I don't have a concrete document.
But I don't want to add new features that can be used for user tracking.

@horo-t makes sense, but I'm still murky on how a distributor URL might enable that. Is the answer, we don't know and hence we're erring on the side of safety, or do we have a concrete case in mind that we need to consider?

If the aggregator has a link to the SXG which URL contains a TRACKING_ID, and the publisher can know the SXG's URL, the aggregator can send the TRACKING_ID to the publisher.
Example:
https://TRACKING_ID.distributor.example/publisher.example/article.html.sxg
https://distributor.example/TRACKING_ID/publisher.example/article.html.sxg

(Redundantly posting this on all the issues where the tracking question came up.)
In a UA that hasn't addressed other forms of tracking via Referer and request URLs, this method of tracking isn't any better than those forms and so isn't an issue. In a UA that has addressed the other forms, by stripping parts of URLs, it can strip the same parts of the distributor's URLs from the error report. So we shouldn't treat anti-tracking as a reason not to report the distributor's URL. There may still be other reasons.

What happens if cert url was a Data URL? Will it be always considered cross-origin to reporting origin?

The report is not "downgraded" if the cert URL is a data URL. See the step 7 of "Queuing signed exchange report".
https://wicg.github.io/webpackage/loading.html#queue-report