Minimal, production-ready HTTP byte-range reader that behaves like a read-only file object. It supports 2‑chunk LRU caching, parallel prefetch, and clean random access into large remote files (think: ZIP archives, tarballs, parquet splits, ISO images) without downloading the whole object.
Python 3.9+. Transport:
requests
(HTTP/1.1).
- Single-file, zero-deps (runtime) except
requests
- 2-chunk LRU (current + previous) to reduce re-fetches on back-seeks
- Background prefetch of the next chunk for smooth sequential reads
If-Range
withETag
/Last-Modified
to prevent mixing chunks after remote updates- Graceful fallback when servers ignore
Range
(200 OK) - Works anywhere a file-like object works (
zipfile
,tarfile
,PIL.Image.open
, etc.)
pip install http-range-reader
(or use directly by copying src/http_range_reader/reader.py
into your project)
from zipfile import ZipFile
from http_range_reader import HTTPRangeReader
url = "https://github.com/psf/requests/archive/refs/heads/main.zip"
rdr = HTTPRangeReader(url, chunk_size=1024*1024, prefetch=True)
with rdr:
with ZipFile(rdr) as zf:
print(len(rdr), "bytes over HTTP")
print("first 5 entries:")
for info in zf.infolist()[:5]:
print("-", info.filename, info.file_size)
data = zf.read(zf.infolist()[0].filename)
print("read", len(data), "bytes from first member")
- You need random access into large objects over HTTP
- You want to avoid full downloads and keep RAM small
- You can rely on standard HTTP servers/CDNs that support Range requests
Does it cache the whole file? No. It caches at most two chunks at a time.
HTTP/2 or HTTP/3? Default transport is requests
(HTTP/1.1). You can swap your own transport if needed.
Thread safety? Intended for single-reader usage. The internal executor is only for prefetching.
python -m examples.http_zip_demo --url https://github.com/psf/requests/archive/refs/heads/main.zip --list 10
- Optional
httpx
transport (HTTP/2) - Adaptive prefetch sizing
- Multi-range coalescing (multipart/byteranges) when beneficial