Introduce a canonical preload list "source of truth" separate from the Chromium repo

Question

Introduce a canonical preload list "source of truth" separate from the Chromium repo

Opened this issue 8 years ago · 21 comments

e.g. an endpoint like /api/v2/preload-list

The goal would be for all major browsers, including Chrome, to pull the current list from that URL.
That way, the Chromium process doesn't introduce additional lag for new additions and removals, which can currently add up to extra months for affected changes due to mismatched release cycles.

This needs to be done carefully. @sleevi tells me that https://github.com/publicsuffix/list can't change its format without risk of breaking stuff, because they don't even know who all the consumers are.

(We could, say, require consumers of the list to register for an API key to get access to the endpoint, but that seems like a little overkill.)

Answer 1 · 2017-01-20T13:32:47.000Z

e.g. an endpoint like /api/v2/preload-list

Sounds good.

The goal would be for all major browsers, including Chrome, to pull the current list from that URL.
That way, the Chromium process doesn't introduce additional lag for new additions and removals, which can currently add up to extra months for affected changes due to mismatched release cycles.

With this new system you could automate the process of new additions and removals from the list without manual interaction which is better in the long term as the list is only going to get larger in size. Having said that. maybe another endpoint like /api/v2/latest is necessary as having to download the whole list every time is both time consuming and wasting unnecessary resources just to get the latest new additions and removals.

(We could, say, require consumers of the list to register for an API key to get access to the endpoint, but that seems like a little overkill.)

Having consumers register for an API key to get access isn't a overkill. Its gives you a overview of who's using the API endpoint and as well having them register means you can inform the consumers of any sudden or pending changes to the API.

Answer 2 · 2017-01-30T23:10:36.000Z

@lgarron This is going to involved a lot of work implementing the new system. Do you want some extra help?

Answer 3 · 2017-01-30T23:31:02.000Z

Maybe; I need to come up with a roadmap first.
Ping me at the end of February if this hasn't gone anywhere?

Answer 4 · 2017-01-31T21:40:38.000Z

Another idea I mentioned in #78: Maintain a GitHub repository with the preload list.
This provides an auditable transparency log, and allows pull requests for special cases.

Answer 5 · 2017-01-31T21:56:27.000Z

Another idea I mentioned in #78: Maintain a GitHub repository with the preload list.
This provides an auditable transparency log, and allows pull requests for special cases.

@lgarron If we went with this model, how would you protect the list from unauthorised tampering?

Answer 6 · 2017-01-31T22:05:23.000Z

@lgarron If we went with this model, how would you protect the list from unauthorised tampering?

@konklone and I are discussing that right now. We'd need a way to know which accounts are authorized to submit pull requests for which eTLDs.

Automated submissions would still need to go through the normal process.

Answer 7 · 2017-01-31T22:12:14.000Z

So this may become a private closed repo on Github?

Answer 8 · 2017-01-31T22:13:40.000Z

No, it would mean that the maintainers of the repository would need to know, when a pull request is submitted, whether the submitters of the pull request are authorized to represent the eTLD whose hostnames are contained in the pull request.

Answer 9 · 2017-01-31T22:17:47.000Z

In addition to the above, would authorised persons hold PGP key to verify the pull request of specific eTLD?

Answer 10 · 2017-01-31T22:18:14.000Z

Probably not, since PGP key management is probably not viable for many eTLD operators to do.

Answer 11 · 2017-01-31T22:34:20.000Z

Unauthorized tampering can come in many forms and I believe GitHub is not the most suitable place to host such critical list.

Answer 12 · 2017-01-31T22:45:48.000Z

Unauthorized tampering can come in many forms and I believe GitHub is not the most suitable place to host such critical list.

As a similar precedent, the public suffix list is hosted on GitHub. Git revisions are also authenticated, which prevents tampering with the historical log.

Trusted users are necessary anyhow, and GitHub is a fairly secure way to authenticate arbitrary users.

Answer 13 · 2017-01-31T22:51:07.000Z

The Public Suffix List simply requires the creation of a TXT record to indicate the GitHub PR.

We explicitly do not try to maintain a list of authorized user accounts, as those inevitably get stale. Instead, simple and practical demonstrations of authorization are sufficient.

This does mean more work for the domain holder, but avoids any ambiguity on authorization or commitment.

Answer 14 · 2017-01-31T23:10:10.000Z

The eTLD representative could effectively automate the entire process from start to end. The process would just involved adding simple random hash TXT record to the chosen domain DNS and using a generic PR template it can be included into the preload list without fuss.

Answer 15 · 2017-01-31T23:10:44.000Z

The Public Suffix List simply requires the creation of a TXT record to indicate the GitHub PR.

We explicitly do not try to maintain a list of authorized user accounts, as those inevitably get stale. Instead, simple and practical demonstrations of authorization are sufficient.

That seems like a pretty reasonable approach for a one-time transaction ("include me in the PSL"). For this, at least for the time being, we'd be doing regular PRs on a hopefully fairly frequent basis with fresh batches of domains. The TXT record approach, which would involve one-off modifications to the production .gov DNS, is likely to add significant friction to such a process.

The PSL is also already a large-scale project with participation of a high number of public suffixes. For now, there's only 1 eTLD with formal expressed interest in pursuing this approach, and 1 eTLD I'm aware of with informal expressed interest, and so it may be reasonable to pursue a less easily scalable approach at first and change it later.

In addition, in the .gov case, we may (hopefully) get to the point where all new domains are included (not just executive) and so we can reduce the number of transactions by first asking to preload *.gov except X,000 legacy domains. Future transactions would be about deleting batches of legacy domains (and at least some of those could be indicated through publishing an HSTS header, since these would be existing domains).

Answer 16 · 2017-01-31T23:20:01.000Z

The eTLD representative could effectively automate the entire process from start to end. The process would just involved adding simple random hash TXT record to the chosen domain DNS and using a generic PR template it can be included into the preload list without fuss.

In theory, that's definitely true. In practice, I think this would be unworkable for .gov and GSA, since DNS changes to .gov itself are managed with intense bureaucratic care, the relevant GSA program office does not have engineering capabilities in-house, and deploying a new in-house production system to automate this kind of task would require a substantial investment in compliance and authorization work.

Though the US government may be at the extreme end, I expect that a variety of eTLD operators in the world to be in a similar position. The process will have broader applicability if it can work for participants with limited automation capabilities.

Answer 17 · 2017-01-31T23:21:19.000Z

Though the US government may be at the extreme end, I expect that a variety of eTLD operators in the world to be in a similar position. The process will have broader applicability if it can work for participants with limited automation capabilities.

Though I should also say, hopefully most eTLD operators will be able to take a blunter hand than .gov and take the approach described above (*.etld except for X legacy domains), which makes a number of problems go away. So it could be that an immediate-term process for .gov ends up getting discarded in the long run no matter what.

Answer 18 · 2017-01-31T23:49:33.000Z

How many .gov domains are registered?

Answer 19 · 2017-01-31T23:53:00.000Z

btw the bug for preloading .gov/eTLDs is #78. ;-)

I think @konklone is planning to reply to @ByJamesBurton's last comment, but keeping the rest of the discussion in #78 will keep this bug cleaner for what we need it for. :-)

Answer 20 · 2017-01-31T23:54:15.000Z

How many .gov domains are currently registered?

It's ~5,650 -- GSA posts a copy of this list here:
https://github.com/GSA/data/blob/gh-pages/dotgov-domains/current-full.csv

(Though relying on that repository for the official to-be-preloaded data file would be a significant thing -- the repository is not currently used for security-critical work.)

Around ~1,100 of those domains are used by the federal government's executive branch. (Most are state/local.) The announced .gov preloading plan covers newly issued domains (going forward) for the federal government's executive branch, and the rate of issuance in that subset probably ranges from a handful of domains per month up to maybe 20 domains a month at maximum.

@lgarron Moving to #78! =)

Answer 21 · 2017-08-28T17:53:54.000Z

Moving the source of truth is no longer a goal for me (as of a few months ago).
It's not out of the question, but it's not necessary for anything now that the Chromium list can be updated cheaply.