ipfs-shipyard/nopfs

HTTPSubscriber: detect when list is not append-only

lidel opened this issue · 2 comments

Extracted from https://github.com/protocol/badbits.dwebops.pub/issues/32733

We are working on simplifying denylist handling at ipfs.io/dweb.link (https://github.com/ipshipyard/waterworks-infra/issues/113) and want to solely rely on nopfs support in rainbow, where a denylist is passed via RAINBOW_DENYLISTS=https://badbits.dwebops.pub/badbits.deny.

Problem

Right now, the list is sorted, and the new double-hashed entries can be added in the middle of the denylist.

This runs into the current limitation of HTTPSubscriber.downloadAndAppend() which assumes every list is append-only, and makes a blind range request for new bytes beyond the ones it already has.

This means subscribing to current badbits or some third-party list that is not append-only is error-prone: client will be missing updates that were inserted in the middle of the file, and not appended at the end (example).

Proposed solution

HTTPSubscriber could remember the last rule, and if the same rule is seen again in Range response beyond the end of old file, we know the update has likely inserted some entries earlier.

In such case we would discard Range response and refresh the entire file to ensure we don't miss any updates.

The file would be downloaded only on actual update once #38 is also implemented.

It is not sustainable to re-download and re-process everything every minute because of a single-line added to a file (like it happens now for ipfs.io gateways). Adding such feature allows people to keep using non-append only lists. What I'd like is to force people to use append-only lists.

It also means more code, when no more code is necessary really.

Also, it is simpler to fix badbits to publish append-only lists than to implement dealing with sorted lists.