ClickHouse/github-explorer

Download: HTTP Range Request Not Supported

iwinux opened this issue · 4 comments

Hi,

I've trying to download the 83GB TSV dataset. The connection keeps getting interrupted and each time I have to start over, because the server always respond with Content-Range: bytes 0-89432430895/89432430896.

Is it possible to fix this or is there any alternative way to fetch this dataset?

No Accept-Ranges: bytes shown in a HEAD request:

$ curl -I 'https://datasets.clickhouse.com/github_events/tsv/github_events_v2.tsv.xz'
HTTP/2 200 
date: Tue, 13 Dec 2022 03:12:21 GMT
content-type: text/tab-separated-values
content-length: 89432430896
x-amz-id-2: G6Yi4dq3k83WF2oziDrxLZkhMHCDZ+80h0XoxdhYsCJCFq284b2y9jbVcYI9QOGbTEbC2qbd8rQ=
x-amz-request-id: V2XBQ3HC5QHKTTAB
last-modified: Mon, 07 Feb 2022 02:06:46 GMT
etag: "e5d93b8c838cfdd9a2a1010680d6a942-5331"
cache-control: max-age=31536000
cf-cache-status: MISS
strict-transport-security: max-age=0; includeSubDomains; preload
x-content-type-options: nosniff
server: cloudflare
cf-ray: 778b84842b8f0cf3-LAX
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

This is a problem with Cloudflare that we use for proxying these links.
Something that prevents its usage for video hosting on free accounts.

Here is another link:
https://clickhouse-public-datasets.s3.amazonaws.com/github_events/tsv/github_events_v2.tsv.xz

It can be used alternatively.

Thank you! The alternative link is working.

I have uploaded the updated dataset:
https://clickhouse-public-datasets.s3.amazonaws.com/github_events/tsv/github_events_v3.tsv.xz

Good for analysis.
I will be interested to hear about your research if there will be something to share.