mxsasha/nrtmv4

Write something about gzip

mxsasha opened this issue · 6 comments

Recommending GZIP seems useful especially for snapshots. RPSL compresses well. Should this be done by the HTTP server? Served as gzipped files? Hash/signature of plaintext or gzipped content? If widely supported, recommending gzip compression at the HTTP server level seems easiest.

job commented

We probably need to make a choice to either compress the files and be explicit (by literally serving files as '.gz' - compression handled by IRRd) - or not mention it at all (the latter option meaning that the client & server will figure it out as they deem fit.

There is some tradition in IRR context to distribute .gz files

I am in favor of utilizing .gzip to compress the data. However I am a little skeptical if we should make it obligatory and be explicit on that. The reason behind is that today is gzip, tomorrow something else. Perhaps we can make compression optional and provide some direction to developers towards .gzip. Maybe mention that is (strongly) encouraged to use compression with .gzip ?

job commented

I am not concerned about 'tomorrow it will be something else', the last 25 years have been .gz :-)

$ lftp ftp://ftp.radb.net/radb/dbase
cd ok, cwd=/radb/dbase
lftp ftp.radb.net:/radb/dbase> ls
-rw-r--r--    1 0        0               5 Dec 21 06:04 ALTDB.CURRENTSERIAL
-rw-r--r--    1 0        0               5 Jul 04  2019 AOLTW.CURRENTSERIAL
-rw-r--r--    1 0        0               5 Dec 21 05:13 ARIN-NONAUTH.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 04:59 ARIN.CURRENTSERIAL
-rw-r--r--    1 0        0               5 Dec 21 06:12 BBOI.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 06:30 BELL.CURRENTSERIAL
-rw-r--r--    1 0        0               4 Dec 21 04:39 CANARIE.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Jan 13  2021 EASYNET.CURRENTSERIAL
-rw-r--r--    1 0        0               3 Dec 21 04:43 HOST.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 04:55 JPIRR.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 06:19 LEVEL3.CURRENTSERIAL
-rw-r--r--    1 0        0               2 Dec 21 05:00 NESTEGG.CURRENTSERIAL
-rw-r--r--    1 0        0               7 Dec 21 04:36 NTTCOM.CURRENTSERIAL
-rw-r--r--    1 0        0               2 Dec 21 06:31 OPENFACE.CURRENTSERIAL
-rw-r--r--    1 0        0               3 Jan 11  2019 OTTIX.CURRENTSERIAL
-rw-r--r--    1 0        0               2 Dec 21 06:21 PANIX.CURRENTSERIAL
-rw-r--r--    1 0        0               7 Dec 21 04:51 RADB.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 05:38 REACH.CURRENTSERIAL
-rw-r--r--    1 0        0               3 Jun 22 07:53 RGNET.CURRENTSERIAL
-rw-r--r--    1 0        0               4 Aug 28  2019 RISQ.CURRENTSERIAL
-rw-r--r--    1 0        0               5 Sep 22  2020 ROGERS.CURRENTSERIAL
-rw-r--r--    1 0        0               6 Dec 21 06:11 TC.CURRENTSERIAL
-rw-r--r--    1 0        0          747530 Dec 21 06:04 altdb.db.gz
-rw-r--r--    1 0        0            8996 Jul 04  2019 aoltw.db.gz
drwxr-xr-x    2 0        0          151552 Dec 21 00:00 archive
-rw-r--r--    1 0        0          965963 Dec 21 05:13 arin-nonauth.db.gz
-rw-r--r--    1 0        0         2399550 Dec 21 04:59 arin.db.gz
-rw-r--r--    1 0        0           56990 Dec 21 06:12 bboi.db.gz
-rw-r--r--    1 0        0          279235 Dec 21 06:30 bell.db.gz
-rw-r--r--    1 0        0           47447 Dec 21 04:39 canarie.db.gz
-rw-r--r--    1 0        0           53503 Jan 13  2021 easynet.db.gz
-rw-r--r--    1 0        0             404 Dec 21 04:43 host.db.gz
-rw-r--r--    1 0        0          281181 Dec 21 04:55 jpirr.db.gz
-rw-r--r--    1 0        0         6598134 Dec 21 06:19 level3.db.gz
-rw-r--r--    1 0        0             533 Dec 21 05:00 nestegg.db.gz
-rw-r--r--    1 0        0         3855883 Dec 21 04:36 nttcom.db.gz
-rw-r--r--    1 0        0            2390 Dec 21 06:31 openface.db.gz
-rw-r--r--    1 0        0            5985 Jan 11  2019 ottix.db.gz
-rw-r--r--    1 0        0            1045 Dec 21 06:21 panix.db.gz
-rw-r--r--    1 0        0        19195672 Dec 21 04:51 radb.db.gz
-rw-r--r--    1 0        0          226736 Dec 21 05:38 reach.db.gz
-rw-r--r--    1 0        0            4611 Jun 22 07:53 rgnet.db.gz
-rw-r--r--    1 0        0           20835 Aug 28  2019 risq.db.gz
-rw-r--r--    1 0        0            4552 Sep 22  2020 rogers.db.gz
-rw-r--r--    1 0        0          885968 Dec 21 06:11 tc.db.gz
lftp ftp.radb.net:/radb/dbase>

What matters most is that we are VERY CLEAR - so that the next 25 years things work smoothy without surprises :-)

Hahaha, sure I don't disagree that it was working like this for the past 25 years but there are no guarantees that it will be in the same status for the next 25 years as well.

But I agree that we should be clear in the draft that things must work smoothly for the next 25 years or so.

We probably need to make a choice to either compress the files and be explicit (by literally serving files as '.gz' - compression handled by IRRd) - or not mention it at all (the latter option meaning that the client & server will figure it out as they deem fit.

I strongly prefer the second option: let the HTTP layer deal with compression, with gzip or any other algorithm that the client and server see fit.

This breaks with IRR tradition, but to me the current need for gzip arises in large part from having a mix of vague distribution methods, which is the exact problem we're fixing. Also worth noting is that NRTMv3 has zero compression. Compression is well supported in HTTP and widely used. Adding our own layer and having the implementations manually handle gzip seems like reinventing the wheel.

There is a slight additional cost in disk space while writing out the file, but for radb.db.gz we're looking at 393MB uncompressed, 24MB compressed. When reading files, IRRD at least already decompresses on disk first, so disk use will actually improve slightly when using HTTP compression, but it's all insignificant compared to other storage costs.

Unless someone wants to disagree on having it in HTTP, there's no further work here :)