whosonfirst/whosonfirst-www-spelunker

Inconsistent feature counts between WOF features and GeoJSON collection

Closed this issue · 10 comments

I navigated the Spelunker to the record for Russia and clicked the "Download Descendants of Russia" link, which led me here.

The count shown for counties was 2,271 features, but the GeoJSON collection only contained 1,641 features. I tried a second time; the second GeoJSON bundle only contained 1,571 features.

Here's a screen shot of the GeoJSON bundle (light blue) over what was expected (light brown):

screen shot 2017-01-19 at 5 01 11 pm

It may be a coincidence, but both GeoJSON bundles I grabbed from the Spelunker seemed to be missing larger geometries. Also, the progress seemed to slow significantly around 80%.

Screenshot of the Spelunker, for what its worth:

screen shot 2017-01-19 at 4 57 31 pm

Weird, I was able to grab all 2,271 counties. I can send you the GeoJSON file to make sure it has what you were looking for. I double-checked the count like this:

cat wof_bundle_85632685_county.geojson | jq ".features | length"

Interesting.. I downloaded a bundle of county features in Russia just now and the file contained 1,528 features. At first glance, it looks like the same large geometries are not included.

If you send over your GeoJSON, I can compare it to a file of what I expect to see.

Can you tell which counties are missing?

No, not without a bundle of expected features to compare the bundle to...

Actually, the GeoJSON bundle could be joined to the CSV summary. This would list the missing wof:ids.

The CSV summary seems to include all features, the GeoJSON bundle does not.

If you can pull out the missing features, that would help with debugging.

Interesting... after joining, I realized the CSV summary file actually contains 496 duplicate features.

The total count in the CSV summary is 2,271 features (matches feature count in WOF), but those 496 duplicate features are equal to the amount of missing features in the GeoJSON bundle (bundle contains 1,775 features).

So after some testing I was able to reproduce this. @thisisaaronland and I worked with it for a bit and I think we've fixed the underlying problem. Could you try re-downloading those features and see if you get something more reasonable?

Thanks for the test case, btw. It's kind of an easy to miss one.

I downloaded all campus records parented by the United States in one bundle - all counts matched to what was expected. I also downloaded the same set of county records parented by Russia - again, all counts matched to what expected. (!)

I will keep testing the descender tool with new bundles, but it looks like the original issue is now fixed.

Gonna close this, we can open it again if we see the issue crop up again.