Download sqlite database without storing temporary archive
Closed this issue · 2 comments
The sqlite download currently downloads the bz2
archive to a temporary file, and then extracts the database from that local file. This is not ideal for two reasons:
- it increases the disk needed, as there has to at least momentarily be enough disk space to hold the compressed archive and the uncompressed database
- it increases the time needed to download. Ideally the file would be uncompressed as it is downloaded.
It appears this was done since the timestamp of the archived file is generated after it's downloaded, and used for future comparison to avoid re-downloading identical files in the future.
We could probably streamline this by using curl
to get the remote last modified time via HEAD request, and then downloading the archive, without a temporary file, immediately after.
We have had issues piping curl
in the bunzip
in the past, the temporary file isn't ideal but its proven itself to be stable.
Fixed since #417 (comment).
whosonfirst/utils/download_sqlite_all.js
Line 96 in a0bc28d