datasets/world-cities

Improvements / clarifications on contents and fields (subcountry, geopoint etc)

Opened this issue · 1 comments

Comments from openspending/cosmopolitan#25 (comment)

  • At least should include standard name and the English variant
  • I assume that subcountry equals what is "region" in geonames, but "subcountry" is confusing terminology (for me)
  • We also take location (geopoint) and population from geonames, but they are not present here. I grant that there are likely better data sources, particularly for population.

@lexman any thoughts? I know we already have #3 re native name. What about second two points?

@pwalsh @lexman re city population note that we have https://github.com/datasets/population-city

Hello @pwalsh,

At least should include standard name and the English variant
Actually, the field name is the english variant, and you can consider it the standard name (at least for foreign people).

I assume that subcountry equals what is "region" in geonames, but "subcountry" is confusing terminology (for me)

I understand this is really confusing, because all countries don't have the same administrative clustering, so I relied on geoname's work. The documentation of the datapackage says :

Subcountry can be the name of a state (eg in United Kingdom or the United States of America) or the major administrative section (eg ''region'' in France''). See admin1 field on geonames website (http://www.geonames.org/) for further info about subcountry.

Is it understandable ? Would it be better if we reused the name admin1 from geonames for this column ?

We also take location (geopoint) and population from geonames, but they are not present here. I grant that there are likely better data sources, particularly for population.

  • For population I'm glad that @rgrp found https://github.com/datasets/population-city, I've seached it for a long time because it was mentionned by datasets/awesome-data#30. Maybe I should check that we can join both datasets
  • About location, I really like the simplicity of this dataset. Instead of adding two columns to this dataset, what about a geojson file in the same datapackage ? What do you think @pwalsh @rgrp ?