mysociety/yournextrepresentative

Migrate YNR from using PopIt as a backend to django-popolo

Closed this issue · 8 comments

Migrate YNR from using PopIt as a backend to django-popolo
mhl commented

To add some more detail to this, the motivations are:

  • PopIt is being deprecated, so we need to stop relying on it within a year in any case
  • PopIt has always made development of YNR more difficult than it should be - e.g. tickets like this or showing the candidate's name on the recent changes page would involve lots of caching and cache invalidation if the page isn't going to cause hundreds of API requests
  • There's always been a problem with data consistency in PopIt - it's been too easy to create malformed data without any error at the time. With PostgreSQL as a back-end we can have the database enforce foreign key constraints, and use validation in Django, which should help to reduce those problems
  • It's always been confusing for admins that some things need to be edited in the Django admin of YNR, while other things need to be edited in the PopIt backend (very carefully, see "data consistenty" above). It would be nice for people to be able to add posts, for example, in the Django admin, when setting up simple election sites.

We definitely want to switch to some schema that can be used via the Django ORM and which is backed by PostgreSQL. The obvious options are:

  1. Using django-popolo (with multi-table inheritance to add the extra custom fields we need (e.g. a foreign key from Membership to the doesn't-exist-yet Election model))
  2. Come up with our own models

The first option (using django-popolo) seems clearly preferable to me, since there would be a lot of commonality with the django-popolo in the schema we would develop anyway, and since we want to provide an API and data export that provides Popolo data in its JSON serialization, sticking closely to the Popolo data model would be advisable. Also, we'd like to support the django-popolo project as much as possible.

To try to divide up this migration task, here are some smaller steps. This includes a couple of extra steps which aren't strictly necessary, but we might as well do at the same time (e.g. moving to storing elections in the database as an Election data, switching to using onBehalfOf).

  • Document the additional data stored in PopIt for YNR which isn't in django-popolo. This certainly includes, but isn't limited to:
    • An election property on Memberships
    • The versions property on Persons
    • The standing_in property on Persons (this could be dropped if a NotACandidate membership role was introduced to replace the meaning of None being assigned to an election)
    • The party_memberships property on Persons (this can be dropped though)
    • The moderator_why_allowed property on a Person's images
    • The user_why_allowed property on a Person's images
    • The uploaded_by_user property on a Person's images
    • The user_justification_for_use property on a Person's images
    • The md5sum property on a Person's images
  • Write a management command for django-popolo (and submit a pull request with that to upstream) that imports the core popolo data from a PopIt instance's /api/v0.1/export.json. (This should be designed such that it's easy to subclass the Command class in order to import additional data from Persons, Organizations, Memberships and Posts that aren't in the core Popolo schema.)
  • Create an Election model (and register it with the Django admin) to replace the definition of elections in the ELECTIONS data structure in elections/*/settings.py. This should hopefully make the setup of election in a new site somewhat easier. Replace existing references to ELECTIONS and the settings derived from it with references to the Election class. (e.g. ELECTIONS_CURRENT should instead be a method on a custom manager for the Election model.)
  • Create subclasses of the django-popolo models that add the additional data discovered in the first step, and add support for versioning of those models. Hopefully one of these packages will be suitable for that: https://www.djangopackages.com/grids/g/versioning/
  • Subclass the django-popolo Command class to migrate data from Popit to these new models. As part of this migration, only create one membership per candidacy, fixing #502
  • Update all the tests and code that uses the old PopItPerson class or referenced Posts in PopIt to use the new models instead (e.g. candidates/views, official_documents, moderation_queue, tasks, etc. cached_counts could be replaced by judicious use of .count() 😄

Just a comment:
This seems like a big task with a lot of dependencies and a lot of testing.
There are probably some features that are going to be delayed until this one is completed.
And since there are some implementations that would be affected by the switch, timing is important.

mhl commented

We're add the stage of converting the views and tests, so here's a checklist:

Views

  • candidates AskForCopyrightAssigment (no changes needed)
  • candidates AreasView
  • candidates AreasOfTypeView
  • candidates InvalidatePersonView should be removed now
  • candidates InvalidatePostView should be removed now
  • candidates CandidacyView
  • candidates CandidacyDeleteView
  • candidates ConstituencyDetailView
  • candidates ConstituencyListView
  • candidates ConstituencyLockView
  • candidates ConstituenciesUnlockedListView
  • candidates ConstituencyRetractWinnerView
  • candidates ConstituenciesDeclaredListView
  • candidates OrderedPartyListView
  • candidates AddressFinderView
  • candidates HelpApiView (no changes needed)
  • candidates HelpAboutView (no changes needed)
  • candidates ContributorsMixin (no changes needed)
  • candidates PartyListView
  • candidates PartyDetailView
  • candidates PersonView
  • candidates RevertPersonView
  • candidates MergePeopleView
  • candidates UpdatePersonView
  • candidates NewPersonView
  • candidates PostListView
  • candidates RecentChangesView
  • candidates LeaderboardView (no changes needed)
  • candidates UserContributions
  • cached_counts AttentionNeededView
  • cached_counts ConstituencyCountsView
  • cached_counts PartyCountsView
  • cached_counts ReportsHomeView
  • tasks TaskHomeView
  • tasks IncompleteFieldView
  • official_documents DocumentView
  • official_documents CreateDocumentView
  • moderation_queue PhotoUploadSuccess
  • moderation_queue PhotoReviewList
  • moderation_queue PhotoReview

Test Files

  • candidates/tests/test_revert.py
  • candidates/tests/test_flash_calls_to_action.py
  • candidates/tests/test_update_person.py removed because it just tested PopIt
  • candidates/tests/test_leaderboard.py
  • candidates/tests/test_merge_view.py
  • candidates/tests/test_person_view.py
  • candidates/tests/test_constituency_lock.py
  • candidates/tests/test_merge_people.py
  • candidates/tests/test_popit_down_middleware.py no longer relevant, removed
  • candidates/tests/test_rename_restriction.py
  • candidates/tests/test_create_from_reduced_json.py
  • candidates/tests/test_feeds.py
  • candidates/tests/test_constituencies_view.py
  • candidates/tests/test_finders.py
  • candidates/tests/test_max_popit_ids.py
  • candidates/tests/test_constituency_view.py
  • candidates/tests/test_update_view.py
  • candidates/tests/test_areas_view.py
  • candidates/tests/test_version_diffs.py
  • candidates/tests/test_posts_view.py
  • candidates/tests/test_party_pages.py
  • candidates/tests/test_create_person.py
  • candidates/tests/test_csv_export.py
  • candidates/tests/test_terms_agreement.py
  • candidates/tests/test_validators.py
  • candidates/tests/test_new_person_view.py
  • candidates/tests/test_api_help_view.py skip the test until #567 is done
  • candidates/tests/test_recent_changes_view.py
  • candidates/tests/test_record_winner.py
  • candidates/tests/test_age.py
  • official_documents/tests/test_models.py
  • official_documents/tests/test_upload.py
  • moderation_queue/tests/test_upload.py
  • moderation_queue/tests/test_queue.py
  • tasks/tests/test_views.py
  • cached_counts/tests.py

Management commands remaining are:

  • candidates_add_identifier.py
  • candidates_add_new_parlparse_ids.py
  • candidates_add_other_name.py
  • candidates_create_csv.py
  • candidates_create_joint_pseudo_parties.py
  • candidates_dump_popit_parties_to_json.py
  • candidates_fix_party_ids_in_json.py removed, no longer needed
  • candidates_get_candidates_over_time_blah.py removed
  • candidates_get_live_database.py removed
  • candidates_import_ppcs.py - not updating as will most likely need new scrapers
  • candidates_import_statements_of_persons_nominated.py
  • candidates_make_party_sets_lookup.py no longer needed
  • candidates_parties_with_multiple_emblems.py
  • candidates_set_max_person_id.py removed, no longer needed
  • candidates_update_incumbents_standing.py removed (see e975469)
  • candidates_update_popit_parties_from_ec_data.py
  • moderation_queue_detect_faces_in_queued_images.py (no changes needed)
mhl commented

There are various things left to do, which we're dividing up into various sections:

Needed in order to set up Jamaica

  • Create the ImageMetadata model with a OneToOneField to Image from mysociety-django-images for the copyright and user comments data, and update the migration and photo review code accordinngly
  • Fix candidates_create_csv, including:
    • Now should take a base URL parameter
    • Find an alternative OpenNorth image proxy (not the one integrated into PopIt) Just using the PopIt one for now

While working on Jamaica

  • Create a management command to create posts and areas (given a MapIt URL, area type and post_id_format, say)
  • Create a management command to create a basic list of parties (given a country) - should use EP or Wikidata data
  • Update installation docs in docs/new-instance.md
  • Run through the results of git grep FIXME or git grep TODO to check they don't include anything critical
  • Run through the major pages checking for egregious numbers of SQL queries

Needed before merging to master

  • Basic read-only API: #567
  • Get UK-specific tests working (using the ./run-tests approach as in Pombola - ./manage.py test will still work out-of-the-box but no run country-specific tests
  • Remove PopIt settings from the settings module completely and hardcode per-country PopIt URLs in the data migration
  • Go through remaining management commands, see what can be removed
  • Some (possibly a lot of) squashing of the branch
  • Re-run the migration for every country, check that they work as expected
  • Warn our partners (St Paul / Argentina / Burkina Faso) about the changes

FIXMEs from the code that should be fixed:

  • Handle election with multiple winners ( views/constituencies ) postponed for the moment, since the old code didn't support this, new ticket here: #582
  • Prevent changing the Post on a candidate to a locked constituency ( views/people ) already checked by check_update_allowed

There's loads of others but they don't seem urgent to me.

mhl commented

This is now merged to master 🎆 🍷 🐨

Woohoo!