felt/tippecanoe

-pf -pk with global data at zoom levels > 12

Closed this issue · 6 comments

I am not sure this is a bug. Right now I am trying to establish why I am seeing unexpected behaviour from data produced by tippecanoe.

The backstory is that I am piping the entirety of the Who's On First dataset, plus a bit more, in to tippecanoe and converting the resultant MBTiles database into a Protomaps PMTiles database. For example:

I am doing this in order to use the PMTiles database as a fast and cheap point-in-polygon service (derive the tile for a point, fetch the features from that tile, do ray-tracing in memory). For example:

All of which works fine when I produce tiles at zoom 12. For example:

$> features -as-spr -require-polygons -writer-uri constant://?val=jsonl://?writer=stdout:// -iterator-uri org:///usr/local/data -spr-append-property wof:hierarchy whosonfirst-data://?prefix=whosonfirst-data-admin- sfomuseum-data://?prefix=sfomuseum-data-architecture | tippecanoe -P -z 12 -pf -pk -o /usr/local/data/whosonfirst_sfom.mbtiles

The features tool is part of https://github.com/whosonfirst/go-whosonfirst-tippecanoe and iterates through one or more WOF data repositories in an organization outputting features with polygons to STDOUT.

I decided to try the same but at zoom 14 in order to determine whether features in any given tile would have a noticeable impact on speed and the time to do ray-tracing. It took a while (not surprising) but eventually completed without, seemingly, any errors.

However when I try to query the resultant PMTiles database I am getting the Protomaps equivalent of a 404 error (204 no data). For example, this tile in the Richmond / GG Park area of San Francisco:

$> ./bin/pmtile -database whosonfirst_sfom_14 -tiles 's3blob://{BUCKET}?region={REGION}&prefix={PREFIX}&credentials=session' -z 14 -x 2617 -y 6332
2022/12/09 11:18:59 fetching whosonfirst_sfom_14 0-16384
2022/12/09 11:18:59 fetched whosonfirst_sfom_14 0-0
2022/12/09 11:18:59 fetching whosonfirst_sfom_14 12513321-11279
2022/12/09 11:19:00 fetched whosonfirst_sfom_14 12513321-11279
2022/12/09 11:19:00 /whosonfirst_sfom_14/14/2617/6332.mvt returns status code 204

The pmtile tool is part of https://github.com/whosonfirst/go-whosonfirst-spatial-pmtiles/ and dumps the contents of a tile as a GeoJSON FeatureCollection to STDOUT

Unfortunately, as I write this, I don't have the intermediary MBTiles database because it gets removed when the container doing all the work completes. Looking at the logs Protomaps seemed perfectly happy with the MBTiles data.

Curiously, the resultant PMTiles database is 7.1GB so I am scratching my head to understand what is or isn't in there.

Does any of what I've just described trigger any "Oh yeah, that's expected..." thoughts or, absent any obvious errors being reported, any ideas how to think about debugging this?

If zoom level 12 is the functional limit then I can live with that.

bdon commented

a 204 is different than a 404: a 204 means the zoom level is within the valid range for the archive, but there is no tile there (the empty set of features). What is the fallback logic in the MBTiles case when there is no data at zoom 14?

This doesn't sound like a tippecanoe issue.

Yeah, it was incorrect of me to say that the 204 status code in Protomaps meant "not found". Still, it seems weird that the tile mentioned above doesn't contain data that should be there (at a minimum the city of SF and higher).

I've been rebuilding the databases over the weekend and, without any further investigation, can see that the database file for zoom 13 is larger than the "funky" database for zoom 14 which suggests...something?

I will report back when I've had a chance to poke at it some more.

Zoom 13 database works as expected. That and the fact that z13 database is larger than z14 database suggests resource contention/failure somewhere in the processing stack. It's very possible it was related to harvesting all the WOF documents. I would have expected to see something about that in the logs but who knows.

I need to think about how much a AWS-ECS-disk-space-hassle writing the WOF data to a static file will be, rather than STDOUT and piping it directly to tippecanoe, in order to determine whether that's where the problems are manifesting themselves.

In the meantime z13 database works as expected so closing this issue.

Maybe? Does the (go-pmtiles) convert tool write to /tmp ? The ECS instance in question had the largest ephemeral disk possible (200GB) so it seems unlikely. Maybe this, too?

protomaps/go-pmtiles#32

bdon commented

it uses the go standard library tmp file facilities which will write to the system tmp dir. Seems like we do need a way to make it configurable or default to the write directory.