mar-muel/local-geocode

Swiss cities decoded as cantons

Opened this issue · 4 comments

gc.decode("Geneva")

[{'name': 'Geneva',
'official_name': 'Canton de Genève',
'country_code': 'CH',
'longitude': 6.11044,
'latitude': 46.19673,
'geoname_id': '2660645',
'location_type': 'admin1',
'population': 506343}]

Should be decoded as a city (or at least included as an option)

Hi! Not a bug, but a configuration issue.

Geneva seems to be juust below the 200k "large city" population cutoff.

See here: https://github.com/mar-muel/local-geocode?tab=readme-ov-file#configuration

Can you try again using a cutoff of 150k?

from geocode.geocode import Geocode

gc = Geocode(large_city_population_cutoff=150_000)
gc.load() 

Yes, I tried, still doesn't work.

The problem is apparently in these two lines in encoder.py:

# drop name duplicates by keeping only the high priority elements
df['name_lower'] = df['name'].str.lower()
df = df.drop_duplicates('name_lower', keep='first')

As Geneva is both a canton and a city (like Zurich, Bern, etc), cantons are given higher priority here (i.e. lower priority index), and only the canton is returned.

I overrode the method to exclude these lines, so encoder now returns both the city and the canton. Works for me, but perhaps we might think how to apply this in general.

Thanks for the excellent project, BTW! It is a lifesaver.

I was hoping that by changing the large city cutoff it would assign the city of Geneva higher priority which would then be above the admin1 level (see prioritization below). Would need to look into what's going on here. But happy you found a solution that works for you!

        # Priorities
        # 1) Large cities (population size > large_city_population_cutoff)
        # 2) States/provinces (admin_level == 1)
        # 3) Countries (admin_level = 0)
        # 4) Places
        # 5) counties (admin_level > 1)
        # 6) continents
        # 7) regions

I think the reason is that in the original Geonames dataset, Geneva is classified as "administrative center of the corresponding canton" for some reason, not a "city". I'll take a closer look today.