whosonfirst/whosonfirst-properties

Duplicate property json files

Opened this issue · 2 comments

Cloning this repo, Git warns that there are several json files with duplicate names - lowercase / uppercase:

  • 'properties/ne/ADM0_A3.json' / 'properties/ne/adm0_a3.json'
  • 'properties/ne/FEATURECLA.json' / 'properties/ne/featurecla.json'
  • 'properties/ne/ISO_A2.json' / 'properties/ne/iso_a2.json'
  • 'properties/ne/LABELRANK.json' / 'properties/ne/labelrank.json'
  • 'properties/ne/SCALERANK.json' / 'properties/ne/scalerank.json'
  • 'properties/ne/SOV_A3.json' / 'properties/ne/sov_a3.json'

Should these files be combined? If so, should records be updated to reflect the casing of the property name?

Great question! This doc has a good explanation about what's going on:

My guess is some WOF records use ne:ADM0_A3 and some others use ne:adm0_a3 properties so separate property JSON were created because the raw files are case sensitive text.

In future we should enforce by convention that all properties are lowercase? This particular drift happened in Natural Earth because the tooling changed between versions and some versions were all lowercase, and some uppercase, and some mixed the two. In DBF format used for Shapefiles the casing doesn't matter.

The problem arrises when your operating system can't hold both of those as separate ideas. So the checkout warns that only one of the 2 will enjoy a local checkout. Looking at the ADM0_A3 example, the two JSON contain slightly different contents. Vicchi's changes with the pattern addition should be kept by copying it over to the lowercase property named JSON file.

Generally we should standardize on all lowercase, and the properties in all the WOF records would also need to be scrubbed (crawls across all the repos in addition to a change in this repo).

I agree on standardizing on lower case.