pelias/whosonfirst

Issue with pelias download wof: Corrupted SQLite during whosonfirst full planent Data Download

taminoelgert opened this issue · 2 comments

Describe the bug
When attempting a full planet build using Kubernetes, the pelias download wof command consistently throws the following error after downloading the whosonfirst data:

error: [whosonfirst] error downloading whosonfirst-data-admin-latest.db.bz2
Error: Command failed: curl -sA 'pelias-whosonfirst/0.0.0-development' https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2 | lbunzip2 > /data/whosonfirst/sqlite/whosonfirst-data-admin-latest.db
lbunzip2: stdin: compressed data error: bad block header magic

Steps to Reproduce

  • use full planent config
  • start whosonfirst container with ./bin/download command (pelias download wof)

Expected behavior

The pelias download wof command should download the whosonfirst data without encountering any errors.

Environment (please complete the following information):

  • Kubernetes environment with 32 cores and 64 GB RAM (on Kubernetes nodes).
  • Local environment with 24 cores and 32 GB RAM.
  • OS: [e.g. Linux]
  • Docker version 24.0.7, build afdd53b

Pastebin/Screenshots

pelias config:

{
      "logger": {
        "level": "info",
        "timestamp": true
      },
      "esclient": {
        "apiVersion": "7.x",
        "hosts": [
          {
            "protocol": "https",
            "host": "geocoder-es-http",
          }
        ]
      },
      "acceptance-tests": {
        "endpoints": {
          "docker": "http://pelias-api:4000/v1/"
        }
      },
      "api": {
        "services": {
          "placeholder": {
            "url": "http://pelias-placeholder:4100"},
          "interpolation": {
            "url": "http://pelias-interpolation:4300"},
          "libpostal": {
            "url": "http://pelias-libpostal:4400"}
        }
      },
      "imports": {
        "adminLookup": {
          "enabled": true
        },
        "geonames": {
          "datapath": "/data/geonames",
          "countryCode": "ALL"
        },
        "openstreetmap": {
          "download": [
            {
              "sourceURL": "https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf"}
          ],
          "leveldbpath": "/tmp",
          "datapath": "/data/openstreetmap",
          "import": [
            {
              "filename": "planet-latest.osm.pbf"
            }]
        },
        "openaddresses": {
          "datapath": "/data/openaddresses",
          "files": [
          ]
        },
        "polyline": {
          "datapath": "/data/polylines",
          "files": [
            "extract.0sv"]
        },
        "whosonfirst": {
          "datapath": "/data/whosonfirst",
          "importPostalcodes": true
        },
        "interpolation": {
          "download": {
            "tiger": {
              "datapath": "/data/tiger"
            }
          }
        }
      }
    }

Additional context

The issue can also be reproduced locally in a Docker environment by following the same steps up to the pelias download all command. Subsequent steps, such as placeholder prepare, fail because "the SQLite is corrupted."

References

Thank you for your assessment

Hi @taminoelgert, I wasn't able to reproduce this issue.

It might have been an intermittent connection issue with our CDN provider https://bunny.net/
Could you please confirm if the issue has resolved itself?

aria2c https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2

03/11 15:29:45 [NOTICE] Downloading 1 item(s)
 *** Download Progress Summary as of Mon Mar 11 15:30:47 2024 ***
=============================================================================
[#b8d2ec 6.2GiB/8.0GiB(78%) CN:1 DL:92MiB ETA:19s]
FILE: /tmp/whosonfirst-data-admin-latest.db.bz2
-----------------------------------------------------------------------------

[#b8d2ec 7.9GiB/8.0GiB(98%) CN:1 DL:108MiB]
03/11 15:31:06 [NOTICE] Download complete: /tmp/whosonfirst-data-admin-latest.db.bz2

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
b8d2ec|OK  |   105MiB/s|/tmp/whosonfirst-data-admin-latest.db.bz2

Status Legend:
(OK):download completed.
lbunzip2 -t whosonfirst-data-admin-latest.db.bz2

echo $?
0

Thanks for the reply, I have just tried again and now it seems to be working without any problems. Thanks for the help though, I'll close the ticket then.