coderholic/django-cities

django-cities returns a MultipleObjectsReturned exception raised while importing

mayela opened this issue · 22 comments

Checklist

  • I have verified that I am using a GIS-enabled database, such as PostGIS or Spatialite.
  • I have verified that that issue exists against the master branch of django-cities.
  • I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • I have reduced the issue to the simplest possible case.
  • I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

python project/manage.py cities --import=all

Expected behavior

Sucessful import

Actual behavior

Traceback (most recent call last):
  File "project/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/core/management/__init__.py", line 363, in execute_from_command_line
    utility.execute()
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/core/management/__init__.py", line 355, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.4/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/cities/management/commands/cities.py", line 160, in handle
    func()
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/cities/management/commands/cities.py", line 1005, in import_postal_code
    region__country=pc.country)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/vagrant/environment/venv/lib/python3.4/site-packages/django/db/models/query.py", line 384, in get
    (self.model._meta.object_name, num)
cities.models.MultipleObjectsReturned: get() returned more than one Subregion -- it returned 2!

I am also encountering this issue. As you can see from the traceback above, the problem is that there are Subregion objects that are non-unique by a combination of region name, subregion name, and country. I've determined that there are, in fact, 56 subregion names that are non-unique for their countries in the data, they are listed below (region id, subregion name, number of matching Subregion objects). These represent 119 Subregion objects, withe the following ids.

[10402309, 2240688, 7839621, 7839620, 6158841, 6158840, 9179465, 8260564, 10195120, 10192083, 10195161, 10192085, 10193946, 10193226, 10195159, 10193630, 10195184, 10194916, 10201610, 10201580, 10346907, 10346906, 10347016, 10346968, 10347121, 10347095, 2413641, 2413640, 449616, 66575, 449585, 68471, 449494, 122490, 1482479, 1158999, 11495044, 11495013, 11495327, 11495315, 2111742, 2111741, 11497558, 2129758, 1521359, 1521352, 7669283, 2276212, 8051342, 7910308, 1736645, 1732562, 11600787, 11600418, 11600445, 11600442, 11600460, 11600452, 11600486, 11600458, 11607562, 11606028, 11600691, 11600481, 11608001, 11600482, 11608044, 11600488, 11600492, 11607518, 11608040, 11608123, 11606129, 11600493, 11605860, 11600687, 11608422, 11607999, 11601327, 11608130, 11606060, 11608034, 11606251, 11606747, 11606700, 11607574, 11607526, 11607531, 11607528, 11621653, 11607570, 11607567, 11607566, 11607991, 11607571, 11608088, 11607985, 11608073, 11608033, 11608423, 11608421, 11621659, 11621593, 11621643, 11621628, 7530824, 6690157, 7530825, 6690163, 11496463, 1619260, 7870615, 2473636, 7870604, 2473663, 9197251, 699293, 1538277, 1514190]
(2239858, 'Kiwaba Nzoji'): 2
(2058645, 'Narrogin'): 2
(6093943, 'Sudbury'): 2
(209610, 'Maniema'): 2
(146267, 'Ágios Epifánios'): 2
(146267, 'Kaló Chorió'): 2
(146267, 'Agía Marína'): 2
(146267, 'Ágios Geórgios'): 2
(146267, 'Ágios Theódoros'): 2
(146615, 'Ágios Andrónikos'): 2
(146398, 'Kivisíli'): 2
(146383, 'Vása'): 2
(146213, 'Filoúsa'): 2
(2412353, 'Dappo'): 2
(128231, 'Shahrestān-e Sīrjān'): 2
(134766, 'Shahrestān-e Fasā'): 2
(418862, 'Shahrestān-e Naţanz'): 2
(1159456, 'Shahrestān-e Īrānshahr'): 2
(3488715, 'Gibraltar'): 2
(3488081, 'Amity'): 2
(2112669, 'Naka-gun'): 2
(2130037, 'Kamikawa-gun'): 2
(1519367, 'Lenīn Aūdany'): 2
(2278292, 'Jorquelleh'): 2
(2380635, 'Aleg'): 2
(1733039, 'Bahagian Pantai Barat'): 2
(11205571, 'Raghunāthpur'): 2
(11205571, 'Chyuṭāhā'): 2
(11205571, 'Kachorwā'): 2
(11205571, 'Pipari̇̄yā'): 2
(11205571, 'Sundarpur'): 4
(11205571, 'Paḍari̇̄yā'): 2
(11205571, 'Kabilāsi̇̄'): 2
(11205571, 'Aurahi̇̄'): 4
(11205571, 'Moti̇̄pur'): 2
(11205571, 'Basantapur'): 2
(11205571, 'Maheshpur'): 3
(11205571, 'Duhabi̇̄'): 2
(11205571, 'Lakṣmi̇̄pur'): 2
(11205571, 'Barāhakṣetra'): 2
(11205571, 'Kochābakhāri̇̄'): 2
(11205571, 'Barahi̇̄ Birpur'): 2
(11205571, 'Dharmapur'): 4
(11205571, 'Piprā'): 2
(11205571, 'Hanumānnagar'): 2
(11205571, 'Arnamā'): 2
(11205571, 'Barchhawā'): 2
(11205571, 'Basbiṭṭi̇̄'): 2
(11205571, 'Gopālpur'): 2
(858786, 'Powiat nowosądecki'): 2
(858786, 'Powiat tarnowski'): 2
(1607530, 'Amphoe Bang Sai'): 2
(2473637, 'Kef Est'): 2
(2472770, 'Nefza'): 2
(696634, 'Novosanzhars’kyy Rayon'): 2
(1484842, 'Chust Tumani'): 2
blag commented

Thank you for reporting this, I'll add that data to the tests and fix it when I can. I don't have a lot of free time at the moment.

I also encounter the same problem.

same here

@blag I can help sanding the test, but about the info given by @joshourisman the info is duplicated so we have to avoid that, right?
Thanks in advance.

blag commented

@mayela I think the fix for this is:

diff --git a/cities/management/commands/cities.py b/cities/management/commands/cities.py
index a71b359..1972838 100644
--- a/cities/management/commands/cities.py
+++ b/cities/management/commands/cities.py
@@ -1005,6 +1005,19 @@ class Command(BaseCommand):
                             region__country=pc.country)
                 except Subregion.DoesNotExist:
                     pc.subregion = None
+                except Subregion.MultipleObjectsReturned:
+                    self.logger.warn("Found multiple subregions for '{}' in '{}' - ignoring".format(
+                        pc.region_name,
+                        pc.subregion_name))
+                    self.logger.debug("item: {}\nsubregions: {}".format(
+                        item,
+                        Subregion.objects.filter(
+                            Q(region__name_std__iexact=pc.region_name) |
+                            Q(region__name__iexact=pc.region_name),
+                            Q(name_std__iexact=pc.subregion_name) |
+                            Q(name__iexact=pc.subregion_name),
+                            region__country=pc.country).values_list('id', flat=True)))
+                    pc.subregion = None
             else:
                 pc.subregion = None

but I don't really have time to add it to the test data and verify it.

works for me

any update on getting this fixed merged into master?

blag commented

@pstreck No update yet, but I did get laid off last Friday! So I'll have some free time later this week to fix this, add the test data, and push it to PyPI.

blag commented

I've been working on this, but I'm having trouble reproducing the issue. If it's not working for you, please post your cities configuration variables from your project's settings.py module.

I have tried commenting out the entire CITIES_FILES variable in test_project/settings.py and then running:

PYTHONPATH=. python test_project/manage.py cities --import=all

from the root of the repository (Python 3.5.3, Django 2.0). All subregions seem to import successfully.

If you can reproduce the issue that way, please also set TRAVIS_LOG_LEVEL to DEBUG when you run the command:

PYTHONPATH=. TRAVIS_LOG_LEVEL=DEBUG python test_project/manage.py cities --import=all 2>&1 | tee everything.log

and copy/paste the everything.log file to a pastebin. Please include your Python and Django versions.

I'm not done wrestling with this, but I don't have as much free time as I thought I would to fix it. Any direction or further troubleshooting information you can give me will help me. Thanks!

@blag Your code above fixed this issue for me.

blag commented

@77cc33 It "worked" in that it doesn't error out anymore, but it doesn't connect up postal codes to their subregion in that case either.

I am facing similar issue. While importing postal codes, I am getting following error. Please find the stack trace.

Importing postal codes: 23%|█▍ | 293363/1264588 [2:12:35<7:18:56, 36.88it/s]Traceback (most recent call last):
File "manage.py", line 15, in
execute_from_command_line(sys.argv)
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/core/management/init.py", line 371, in execute_from_command_line
utility.execute()
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/core/management/init.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/usr/lib/python3.5/contextlib.py", line 30, in inner
return func(*args, **kwds)
**File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/cities/management/commands/cities.py", line 160, in handle
func()
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/cities/management/commands/cities.py", line 1061, in import_postal_code

File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/cities/management/commands/cities.py", line 1029, in import_postal_code
Q(city__region__name_std__iexact=pc.region_name) |**
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/manimaran/4.0/v4/lib/python3.5/site-packages/django/db/models/query.py", line 407, in get
(self.model._meta.object_name, num)
cities.models.MultipleObjectsReturned: get() returned more than one District -- it returned 2!

I've deployed this and hit this error. Django 2.0.7, Python 3.7 in a virtenv.

I'm able to avoid this error if I only import certain countries, such as US, CA, AU. I tried UK and this failed.

I've just put in the code you created higher in this thread and trying a full import but another comment suggested this doesn't link those post codes, so I'm a little worried about those. I would really appreciate if this could be fixed and data is consistent. Maybe post your full config so we can confirm what the difference in your settings are?

I'm using SQLlite for testing, my settings.py

CITIES_VALIDATE_POSTAL_CODES = True
#CITIES_POSTAL_CODES = ['US', 'CA', 'AU']
 CITIES_FILES = {
     'city': {
        'filename': 'cities1000.zip',
        'urls':     ['http://download.geonames.org/export/dump/'+'{filename}']
     },
 }

Same here importing cities1000.zip.

Is this project dead?

blag commented

Maintainer (sort of) here. I don’t have a lot of time for this project as I once did and I could use some help. If anybody can put this in an up-to-date PR that would help (make sure you include tests, even if it’s mocking a response or a file), but long term I may need to turn over maintenance to somebody else.

If anybody has time, email coderholic directly, explain a bit about who you are, include a few links to your open source contributions (especially Django apps), and ask to be made a maintainer. Please keep me on as a maintainer, as I still have an invested interest in this project and I may have more time for this project in the future.

@blag
I have no experience with opensource projects but if you want I can help you. At least I can show how to crash django-cities. (Maybe afterwards I can write also tests and fixes)

The topic issue still happens.

Traceback (most recent call last):
  File "/home/me/project/app/manage.py", line 24, in <module>
    main()
  File "/home/me/project/app/manage.py", line 20, in main
    execute_from_command_line(sys.argv)
  File "/home/me/project/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/home/me/project/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/me/project/env/lib/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/me/project/env/lib/python3.7/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/me/project/env/lib/python3.7/site-packages/cities/management/commands/cities.py", line 160, in handle
    func()
  File "/home/me/project/env/lib/python3.7/site-packages/cities/management/commands/cities.py", line 1006, in import_postal_code
    region__country=pc.country)
  File "/home/me/project/env/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/me/project/env/lib/python3.7/site-packages/django/db/models/query.py", line 412, in get
    (self.model._meta.object_name, num)
cities.models.MultipleObjectsReturned: get() returned more than one Subregion -- it returned 2!
struy commented

The same issue
CITIES_FILES = {
'city': {
'filename': 'US.zip',
'urls': ['http://download.geonames.org/export/dump/'+'{filename}']
},
}

CITIES_LOCALES = ['LANGUAGES']
CITIES_POSTAL_CODES = ['US']

Someone manage to make this work?

The same issue
CITIES_FILES = {
'city': {
'filename': 'cities1000.zip',
'urls': ['http://download.geonames.org/export/dump/'+'{filename}']
},
}

zypro commented

Same here for Districts when import of postal_codes starts.
So it seems to be a problem with the data coming from geonames.org ?

CITIES_FILES = {
    'city': {
        'filename': 'DE.zip',
        'urls':     ['http://download.geonames.org/export/dump/'+'{filename}']
    },
}

(django 3.1.2, python 3.8)