coderholic/django-cities

Models - point vs area

Opened this issue · 22 comments

I noticed the models have a PointField instead of a polygon
Is there a good strategy in the future to make use of GIS enabled databases or the GeoDjango features?

Also if this is a point field what is the point in reference to?

blag commented

I noticed the models have a PointField instead of a polygon
Is there a good strategy in the future to make use of GIS enabled databases or the GeoDjango features?

As far as I'm aware, PointField do make use of GIS enabled databases, although we do support databases without GIS support (in which case we use a simple hypotenuse calculation to figure out the closest encompassing feature).

Also if this is a point field what is the point in reference to?

My assumption is the centroid of the polygon of the feature, but that would be best answered by Geonames. The PointField is calculated from the latitude and longitude fields from the Geonames dump.

latitude : latitude in decimal degrees (wgs84)
longitude : longitude in decimal degrees (wgs84)

You can find more information from:

the geonames faq :

http://forum.geonames.org/gforum/forums/show/6.page

The forum : http://forum.geonames.org

or the google group : http://groups.google.com/group/geonames

I would definitely accept a PR that added a BoundaryField and imported the polygon data from Geonames. It would probably be easiest to use their shapes_simplified_low.json.zip and use Python's geojson package to import the boundaries for the features we already import.

Fair enough. I am geofencing and need to have the polygons for more exact tracking.

If I can keep this issue open for a while I will see if we can bring in the boundary/topology data.

blag commented

@solvire Sure, we can keep this issue open. 😄

The project is paused so if I can't get a proposition in within a month I'll go ahead an update.
I'll have a note on my calendar to do so.
Thanks for entertaining my request.

Hello all!

Boundaries would rock. I also need them for a bunch of other things, like determining if a point belongs to a certain city.

I'll see what I can do to create PR.

blag commented

@george-silva Let me know if you have any questions or need any help. This would be a fantastic addition.

@blag I still haven't had time to work on this, but I might this week.

Looking forward to this. If this is done I can ditch the other package I work with for cities and use just this one.

Thanks!

blag commented

I'm curious: what other package are you using in conjunction with this one?

And I'm in the middle of a job hunt, so I probably won't be much use to you (in terms of writing code) for a few weeks. I'll try to review code for you though. Thanks for taking this on! 👍 ❤️

@blag I'm using django-municipios, which is Brazil only.

No problem. The code review will be valuable.

blag commented

I'm glad that django-municipios exists, but too bad is Brazil only. Are the shapefiles that django-municipios imports supplied by users or does it import shapefiles from somewhere?

I've been thinking about how I would implement this feature for a few weeks, here's some things I would consider:

  • Like Django does, I would treat the PostGIS implementation as the "gold standard" for boundary handling. For instance, PostGIS supports MultiPolygon fields (eg: polygons with holes) but vanilla PostgreSQL only supports simple Polygon fields (eg: polygons without holes). PostGIS seems to be the most featureful and stable GIS-enabled database extension.
  • If at all possible, try to also support Spatialite databases. I'm not too worried about supporting MySQL database variants, because most users probably aren't using MySQL for GIS queries to begin with.
  • Add the boundary field to the Place abstract model, that way all subclasses (which is every model but AlternativeName) automatically get boundaries with consistent semantics.
  • Adding the boundary fields is probably the easy part. The more difficult part to write is the import script. I'd like to allow users to specify exactly which countries they want to import data from, since the import script already takes a long time to complete.
  • Keep in mind that some users may not run the migration/s that add boundary fields to models before they run the import script, so check for the boundary field on the model before trying to import it. See this line for how I handle district models without a code field.
  • Keep the boundary field nullable, as I don't want to force users with existing data to import boundaries during migrations (I ran into this issue when I added slugs). Migrations to me should only change the database schema, not run a huge import script with a lot of potential to fail in various ways.
  • In the import script, I think would implement importing boundaries as it's own option (eg: python manage.py cities --import=boundary), and I would not import boundaries in the --import=all option. This caveat will need to be documented.
  • Please include test/s for the import script with a minimum of boundary data (see this file with cherry picked test data. It doesn't need to be an in-depth test, something like this would be fine:
    class TestImportBoundaryCommand(TestCase):
        def test_import_boundary(self):
            # Before we import boundaries, check that we have none
            self.assertEqual(Country.objects.exclude(boundary=None).count(), 0)
    
            ... # Call import boundary command
    
            # Check that we successfully imported some boundaries, even though we really don't care what boundaries we imported
            self.assertGreaterThan(Country.objects.exclude(boundary=None).count(), 0)
  • I would suggest using GEOMETRY datatypes instead of GEOGRAPHY datatypes because I'd like this project to apply worldwide. I would rather this project store data without the need for projections even if it means there's a performance hit. See this section of the PostGIS documentation for details.
  • See the PostGIS overview of datatypes to figure out which one to pick. Specifically, I don't know if a simple Polygon field will work or if a MultiPolygon field would be better. The built-in GIS fields all correspond to GIS field types, so check out the the polygon vs. multi-polygon examples in this section of the PostGIS documentation for details.
  • Do you want to support historical boundaries? If yes, I would suggest utilizing django-field-history on the boundary field. Doing so will automatically save a revision to modified boundaries every time the import command is run, and if the boundary isn't modified it won't save a revision.
  • If you do want to save revisions, saving revisions to just the boundary field is much preferable to saving revisions of the whole model.

Don't let any of these confuse you or frighten you off - deal with them as issues come up. And feel free to ping me if you have any questions or anything.

Hello @blag. Let me respond to you point by point.

First of all:

  1. I'm having difficulties running makemigrations. How are you doing that? I've tried using the test_project manage.py and example_project manage.py without success.

Like Django does, I would treat the PostGIS implementation as the "gold standard" for boundary handling. For instance, PostGIS supports MultiPolygon fields (eg: polygons with holes) but vanilla PostgreSQL only supports simple Polygon fields (eg: polygons without holes). PostGIS seems to be the most featureful and stable GIS-enabled database extension.

I'm targeting PostGIS as the gold standard. And the field already is a MultiPolygon. It should be a MultiPolygon to support countries with islands, for example.

If at all possible, try to also support Spatialite databases. I'm not too worried about supporting MySQL database variants, because most users probably aren't using MySQL for GIS queries to begin with.
Add the boundary field to the Place abstract model, that way all subclasses (which is every model but AlternativeName) automatically get boundaries with consistent semantics.

For now, this won't be a problem and SQLite will be supported. There are no differences in Django for creating/editing geometry fields. What we will need to do is to create specific settings to test all the tests against SQLite. I use GitLab at our shop so we have a different method for running two sets of tests (postgresql/sqlite), but we do have two different settings. This can be fairly easy to accomplish as well.

Add the boundary field to the Place abstract model, that way all subclasses (which is every model but AlternativeName) automatically get boundaries with consistent semantics.

I already did that 😄 .

Adding the boundary fields is probably the easy part. The more difficult part to write is the import script. I'd like to allow users to specify exactly which countries they want to import data from, since the import script already takes a long time to complete.

True. First of all we need a source. For countries and continents it's fairly. I don't know any source for city boundary on a global scale. Perhaps OSM can be useful for that.

Keep in mind that some users may not run the migration/s that add boundary fields to models before they run the import script, so check for the boundary field on the model before trying to import it. See this line for how I handle district models without a code field.

Will take extra care with that.

Keep the boundary field nullable, as I don't want to force users with existing data to import boundaries during migrations (I ran into this issue when I added slugs). Migrations to me should only change the database schema, not run a huge import script with a lot of potential to fail in various ways.

Already nullable as well.

In the import script, I think would implement importing boundaries as it's own option (eg: python manage.py cities --import=boundary), and I would not import boundaries in the --import=all option. This caveat will need to be documented.

Ok.

Please include test/s for the import script with a minimum of boundary data (see this file with cherry picked test data.

Ok.

I would suggest using GEOMETRY datatypes instead of GEOGRAPHY datatypes because I'd like this project to apply worldwide. I would rather this project store data without the need for projections even if it means there's a performance hit. See this section of the PostGIS documentation for details.

The fields needs to be GEOMETRY, otherwise it won't have support in Oracle or Sqlite. So that's a no brainer. Already geometry as well.

See the PostGIS overview of datatypes to figure out which one to pick. Specifically, I don't know if a simple Polygon field will work or if a MultiPolygon field would be better. The built-in GIS fields all correspond to GIS field types, so check out the the polygon vs. multi-polygon examples in this section of the PostGIS documentation for details.

MultiPolygon is the correct way to go. Since we already require PostGIS for the location field, I would not worry too much about the native Polygon datatype (I did not see any reference to the native datatype in the project, perhaps I'm wrong).

Do you want to support historical boundaries? If yes, I would suggest utilizing django-field-history on the boundary field. Doing so will automatically save a revision to modified boundaries every time the import command is run, and if the boundary isn't modified it won't save a revision.

I really don't think this belongs to the scope of this project. These boundaries rarely change and the users might care more about the most updated one, instead of a history of boundaries.

I'm already developing some code and the modifications for the fields is already there. As I said, I'm running into issues on how to run makemigrations. If you have any tips on that, let me know.

blag commented

Oh awesome, I didn't realize you already had written code!

And the field already is a MultiPolygon. It should be a MultiPolygon to support countries with islands, for example.

Right! And states like Michigan can only accurately be represented as MultiPolygons. I didn't realize Michigan also had islands as well.

I would not worry too much about the native Polygon datatype (I did not see any reference to the native datatype in the project, perhaps I'm wrong).

The GIS contrib app distributed with Django (django.contrib.gis) has a PolygonField (see here). Not that it matters since it sounds like MultiPolygonField is the better choice and you're already using it. 👍

Do you want to support historical boundaries? If yes, I would suggest utilizing django-field-history on the boundary field. Doing so will automatically save a revision to modified boundaries every time the import command is run, and if the boundary isn't modified it won't save a revision.

I really don't think this belongs to the scope of this project. These boundaries rarely change and the users might care more about the most updated one, instead of a history of boundaries.

I agree, I only bring it up because I don't want to prevent users from implementing that themselves. I want this to be possible only by overriding one of our concrete models and configuring django-field-history instead of forking this project altogether. Hope that makes sense.

As I said, I'm running into issues on how to run makemigrations. If you have any tips on that, let me know.

You might be able to use the management command in test_project to generate migrations for you, but you might have to set your PYTHONPATH environment variable first. In the root of your local django-cities repo, try this:

PYTHONPATH=. python test_project/manage.py makemigrations cities

and see if that works for you.

However, what I'm actually doing is using one of my Django projects to make the migrations for this app. It's a little complicated, but it works for me. My project directory structure looks like:

./project/settings.py  # Contains 'cities' in INSTALLED_APPS
./django-cities  # Git repository for django-cities, added to my project's ./.gitignore
./cities -> ./django-cities/cities  # symlink into the local dango-cities repo
./...other...stuff...

then I can run python manage.py makemigrations cities in my project root to have it generate the migrations in what it thinks is its cities app, when really it's creating them in the django-cities git repo. Double check those migrations for any dependencies outside of django-cities before commiting them to the local django-cities repo though!

The PYTHONPATH trick worked fine.

I'll soon post the model changes and the migration, so you can take a look.

The download of the boundary data is way more involved and I'll take it slow. I need to first identify a good source of data for State/City. Continents and Countries are easy.

Regarding the CLI option, I think we will need to make another switch. Thats because you can download countries AND it's boundaries. Or you might download countries, regions and cities, and it's boundaries. Or you just care about the city boundaries (because the sum of all cities are regions and the sum of all regions are countries, etc).

So perhaps we can make this available for each model?

I should be committing to my fork soon and I'll ping you for a quick review. I'm just installing Py3.6 so we can run tox on it as well. Some of the tests are failing because I don't have all environments (all tests pass on 2.7).

This is great you are pushing it along @george-silva
I didn't have any time to work on it, but hope this project picks back up again to use it. It definitely isn't trivial to add this in there. Put up a donate and I'll buy you a beer. :)

City boundaries using multi-polygon would be great. @george-silva could you push even incomplete code to a fork? I may be able to find time to help this feature happen.

Hi @merwok - glad others are taking interest...
Coincidentally just yesterday I had to pick this up to see where I left off. It's a huge undertaking and I'm not sure I can get the time cleared again. It is probably a month long project. I pulled in the upstream and was just going to ask @george-silva if he had a fork he was working on.

FWIW: In a previous project I rolled my own simple classes for City and Address (didn’t need country or region), with only PostGIS supported (so PointField and MultiPolygonField without compat concerns) and getting some information from Google Maps and from a GeoJSON file published by my city; geo queries worked like a charm. For a new project we wanted to rely on OSM/geonames and standardize on django-cities, so right now we decided we prefer to avoid forking the repo, and we’ll have the multi-polygon field in a custom model linked to cities.City. I can’t spend days on an upstream PR, but if I could be able to contribute things such as manager tests, an import command.

That seems like a very sane way to go about it. Custom models were not working for me though so I had to abandon and did a separate model with OneToOne field. Not ideal.

(Exactly, I meant a separate model with one to one when I said “custom model linked”, given that swapped-in custom models don’t work at the moment)

blag commented

I'm pretty sure @george-silva already pushed his changes to PR #159. It's incomplete, but it's a start.

@merwok and @solvire: Have you checked out that PR?

Hello all. this is my fork: https://github.com/sigma-geosistemas/django-cities

In there you'll find the updated models. What we need help right now is defining the "sources of truth" for each entity (country, city, continent, etc) and to create the utility to download all of this data and update it.

This is the full discussion: #159