/http-range-reader

Stream big files over HTTP like they’re local.

Primary LanguagePythonMIT LicenseMIT

HTTP Range Reader

CI PyPI Python License: MIT

Minimal, production-ready HTTP byte-range reader that behaves like a read-only file object. It supports 2‑chunk LRU caching, parallel prefetch, and clean random access into large remote files (think: ZIP archives, tarballs, parquet splits, ISO images) without downloading the whole object.

Python 3.9+. Transport: requests (HTTP/1.1).

Features

  • Single-file, zero-deps (runtime) except requests
  • 2-chunk LRU (current + previous) to reduce re-fetches on back-seeks
  • Background prefetch of the next chunk for smooth sequential reads
  • If-Range with ETag/Last-Modified to prevent mixing chunks after remote updates
  • Graceful fallback when servers ignore Range (200 OK)
  • Works anywhere a file-like object works (zipfile, tarfile, PIL.Image.open, etc.)

Install

pip install http-range-reader

(or use directly by copying src/http_range_reader/reader.py into your project)

Quickstart

from zipfile import ZipFile
from http_range_reader import HTTPRangeReader

url = "https://github.com/psf/requests/archive/refs/heads/main.zip"
rdr = HTTPRangeReader(url, chunk_size=1024*1024, prefetch=True)

with rdr:
    with ZipFile(rdr) as zf:
        print(len(rdr), "bytes over HTTP")
        print("first 5 entries:")
        for info in zf.infolist()[:5]:
            print("-", info.filename, info.file_size)
        data = zf.read(zf.infolist()[0].filename)
        print("read", len(data), "bytes from first member")

When to use this

  • You need random access into large objects over HTTP
  • You want to avoid full downloads and keep RAM small
  • You can rely on standard HTTP servers/CDNs that support Range requests

FAQ

Does it cache the whole file? No. It caches at most two chunks at a time.

HTTP/2 or HTTP/3? Default transport is requests (HTTP/1.1). You can swap your own transport if needed.

Thread safety? Intended for single-reader usage. The internal executor is only for prefetching.

CLI demo

python -m examples.http_zip_demo --url https://github.com/psf/requests/archive/refs/heads/main.zip --list 10

Roadmap

  • Optional httpx transport (HTTP/2)
  • Adaptive prefetch sizing
  • Multi-range coalescing (multipart/byteranges) when beneficial

License

MIT