kobotoolbox/kobo-docker

Split the HHI production Postgres database in two

jnm opened this issue · 1 comments

jnm commented

To take advantage of the work in #230, we need to actually split a large Postgres database that's currently shared by KPI and KoBoCAT into two databases (one for each of the two Django projects).

  1. ✔️ expand the EBS volume greatly, let's say by 300G (since the total postgres disk use is 372G and we can subtract off logger_instance's 136 G, leaving approx. 236 G in use by the other tables) 💰 this increases our AWS bill by $30/month
  2. ✔️ make a new database for KPI (koboform)
    1. don't forget to create extension postgis and postgis_topology
    2. start a one-off KPI container with docker-compose -f docker-compose.frontend.yml -f docker-compose.frontend.override.yml run --rm kpi bash
    3. change the environment in that container so that KPI uses the new database:
      root@kpi:/srv/src/kpi# DATABASE_URL=$(echo "$DATABASE_URL" | sed 's/kobotoolbox$/koboform/')
      root@kpi:/srv/src/kpi# KPI_DATABASE_URL="$DATABASE_URL"
      root@kpi:/srv/src/kpi# env | grep DATA
      KC_DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/kobotoolbox
      KPI_DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/koboform
      DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/koboform
      
    4. run ./manage.py migrate to set up the KPI schema in the new, empty database
  3. use pglogical or mimeo to replicate the necessary tables from the old database to the new database
    1. ✔️ add the https://dl.2ndquadrant.com/default/release/apt repository; apt-get install postgresql-9.5-pglogical

    2. follow quick setup instructions, which require restarting ⚠️ the postgres server and applying a patch 💣 to kobo-docker:

      diff --git a/postgres/shared/init_02_set_postgres_config.sh b/postgres/shared/init_02_set_postgres_config.sh
      index b8fc1bb..9a0dde6 100644
      --- a/postgres/shared/init_02_set_postgres_config.sh
      +++ b/postgres/shared/init_02_set_postgres_config.sh
      @@ -1,6 +1,12 @@
       #!/usr/bin/env bash
       
      -
      +# make sure pglogical is installed each time the container starts
      +test -e /usr/lib/postgresql/9.5/lib/pglogical.so || (
      +    apt-get update
      +    apt-get install -y curl
      +    curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash
      +    apt-get install -y postgresql-9.5-pglogical
      +)
       
       if [ ! -f "$POSTGRES_CONFIG_FILE.orig" ]; then
           echo "Let's keep a copy of current configuration file!"
      @@ -24,4 +30,4 @@ echo "Applying new client authentication configuration file..."
       cp $KOBO_DOCKER_SCRIPTS_DIR/shared/pg_hba.conf "$POSTGRES_CLIENT_AUTH_FILE"
       
       echo "Creating hg_hba config file..."
      -sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE"
      \ No newline at end of file
      +sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE"
      diff --git a/postgres/shared/pg_hba.conf b/postgres/shared/pg_hba.conf
      index af2cea2..62bf860 100644
      --- a/postgres/shared/pg_hba.conf
      +++ b/postgres/shared/pg_hba.conf
      @@ -8,9 +8,9 @@ host    all             all             127.0.0.1/32            trust
       host    all             all             ::1/128                 trust
       # Allow replication connections from localhost, by a user with the
       # replication privilege.
      -#local   replication     postgres                                trust
      -#host    replication     postgres        127.0.0.1/32            trust
      -#host    replication     postgres        ::1/128                 trust
      +local   replication     postgres                                trust
      +host    replication     postgres        127.0.0.1/32            trust
      +host    replication     postgres        ::1/128                 trust
       
       host    all all all md5
       host    replication KOBO_POSTGRES_REPLICATION_USER all md5
      diff --git a/postgres/shared/postgres.conf b/postgres/shared/postgres.conf
      index 5d2a363..dd08d74 100644
      --- a/postgres/shared/postgres.conf
      +++ b/postgres/shared/postgres.conf
      @@ -1,12 +1,18 @@
       #------------------------------------------------------------------------------------
       # REPLICATION
       #------------------------------------------------------------------------------------
      -wal_level = hot_standby
      -
      -max_wal_senders = 2
      -wal_keep_segments = 256
      -
      -hot_standby = on
      +# wal_level = hot_standby
      +#
      +# max_wal_senders = 2
      +# wal_keep_segments = 256
      +# 
      +# hot_standby = on
      +wal_level = 'logical'
      +max_worker_processes = 10   # one per database needed on provider node
      +                            # one per node needed on subscriber node
      +max_replication_slots = 10  # one per node needed on provider node
      +max_wal_senders = 10        # one per node needed on provider node
      +shared_preload_libraries = 'pglogical'
       
       #------------------------------------------------------------------------------------
       # LOGS
    3. set up replication for one table, kpi_asset:

      root@postgres:/# psql -U kobo kobotoolbox
      select pglogical.create_node(
          node_name := 'combined_db_provider',
          dsn := 'host=localhost port=5432 dbname=kobotoolbox'
      );
      select pglogical.replication_set_add_table('default', 'kpi_asset');
      
      \c koboform
      select pglogical.create_node(
          node_name := 'kpi_db_subscriber',
          dsn := 'host=localhost port=5432 dbname=koboform'
      );
      select pglogical.create_subscription(
          subscription_name := 'kpi_db_subscription',
          provider_dsn := 'host=localhost port=5432 dbname=kobotoolbox'
      );
      
    4. after waiting a little while, check the progress with select count(*) kpi_asset; in both databases and select * from pg_stat_replication;

    5. add the rest of the tables listed in https://github.com/kobotoolbox/kobo-docker/blob/kobo-install-two-databases/postgres/master/clone_data_from_kc_to_kpi.sh to the replication set:
      🛑 💣 for instances running code later than the final shared-database release, more tables may be required! HHI and OCHA require kpi_assetuserpartialpermission in addition to the list below.

      select pglogical.replication_set_add_table('default', 'spatial_ref_sys');
      select pglogical.replication_set_add_table('default', 'django_migrations');
      select pglogical.replication_set_add_table('default', 'django_content_type');
      select pglogical.replication_set_add_table('default', 'auth_user');
      select pglogical.replication_set_add_table('default', 'auth_group');
      select pglogical.replication_set_add_table('default', 'auth_permission');
      select pglogical.replication_set_add_table('default', 'auth_group_permissions');
      select pglogical.replication_set_add_table('default', 'auth_user_groups');
      select pglogical.replication_set_add_table('default', 'auth_user_user_permissions');
      select pglogical.replication_set_add_table('default', 'constance_config');
      select pglogical.replication_set_add_table('default', 'django_celery_beat_periodictasks');
      select pglogical.replication_set_add_table('default', 'django_celery_beat_crontabschedule');
      select pglogical.replication_set_add_table('default', 'django_celery_beat_intervalschedule');
      select pglogical.replication_set_add_table('default', 'django_celery_beat_periodictask');
      select pglogical.replication_set_add_table('default', 'django_celery_beat_solarschedule');
      select pglogical.replication_set_add_table('default', 'django_admin_log');
      select pglogical.replication_set_add_table('default', 'authtoken_token');
      select pglogical.replication_set_add_table('default', 'django_digest_partialdigest');
      select pglogical.replication_set_add_table('default', 'taggit_tag');
      select pglogical.replication_set_add_table('default', 'taggit_taggeditem');
      select pglogical.replication_set_add_table('default', 'kpi_collection');
      select pglogical.replication_set_add_table('default', 'reversion_revision');
      select pglogical.replication_set_add_table('default', 'reversion_version');
      select pglogical.replication_set_add_table('default', 'kpi_assetversion');
      select pglogical.replication_set_add_table('default', 'kpi_importtask');
      select pglogical.replication_set_add_table('default', 'kpi_authorizedapplication');
      select pglogical.replication_set_add_table('default', 'kpi_taguid');
      select pglogical.replication_set_add_table('default', 'kpi_objectpermission');
      select pglogical.replication_set_add_table('default', 'kpi_assetsnapshot');
      select pglogical.replication_set_add_table('default', 'kpi_onetimeauthenticationkey');
      select pglogical.replication_set_add_table('default', 'kpi_usercollectionsubscription');
      select pglogical.replication_set_add_table('default', 'kpi_exporttask');
      select pglogical.replication_set_add_table('default', 'kpi_assetfile');
      select pglogical.replication_set_add_table('default', 'hub_sitewidemessage');
      select pglogical.replication_set_add_table('default', 'hub_configurationfile');
      select pglogical.replication_set_add_table('default', 'hub_formbuilderpreference');
      select pglogical.replication_set_add_table('default', 'hub_extrauserdetail');
      select pglogical.replication_set_add_table('default', 'hub_perusersetting');
      select pglogical.replication_set_add_table('default', 'oauth2_provider_application');
      select pglogical.replication_set_add_table('default', 'django_session');
      select pglogical.replication_set_add_table('default', 'oauth2_provider_accesstoken');
      select pglogical.replication_set_add_table('default', 'oauth2_provider_grant');
      select pglogical.replication_set_add_table('default', 'django_digest_usernonce');
      select pglogical.replication_set_add_table('default', 'oauth2_provider_refreshtoken');
      select pglogical.replication_set_add_table('default', 'registration_registrationprofile');
      select pglogical.replication_set_add_table('default', 'hook_hook');
      select pglogical.replication_set_add_table('default', 'hook_hooklog');
      select pglogical.replication_set_add_table('default', 'external_integrations_corsmodel');
      select pglogical.replication_set_add_table('default', 'help_inappmessage');
      select pglogical.replication_set_add_table('default', 'help_inappmessagefile');
      select pglogical.replication_set_add_table('default', 'help_inappmessageuserinteractions');
      
    6. didn't work! the docs say

      It's usually better to create replication sets before subscribing so that all tables are synchronized during initial replication setup in a single initial transaction. However, users of bigger databases may instead wish to create them incrementally for better control.

      pglogical.alter_subscription_resynchronize_table() is available, but it fails because the "table may not be the target of any foreign key constraints." pglogical.alter_subscription_synchronize(subscription_name := 'default', truncate := true) likewise fails because of FK constraints. i dropped the subscription, subscriber, and koboform database and started again, but i still ended up having to TRUNCATE…RESTART IDENTITY CASCADE each table since ./manage.py migrate adds lots of rows, e.g. in django_content_type.

    7. replicate sequences; in the source database, execute (list obtained from \ds in the destination database):

      select pglogical.replication_set_add_sequence('default', 'auth_group_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'auth_group_permissions_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'auth_permission_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'auth_user_groups_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'auth_user_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'auth_user_user_permissions_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'constance_config_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_admin_log_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_celery_beat_crontabschedule_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_celery_beat_intervalschedule_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_celery_beat_periodictask_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_celery_beat_solarschedule_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_content_type_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_digest_partialdigest_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_digest_usernonce_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'django_migrations_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'external_integrations_corsmodel_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'help_inappmessage_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'help_inappmessagefile_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'help_inappmessageuserinteractions_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hook_hook_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hook_hooklog_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hub_configurationfile_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hub_extrauserdetail_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hub_formbuilderpreference_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hub_perusersetting_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'hub_sitewidemessage_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_asset_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_assetfile_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_assetsnapshot_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_assetuserpartialpermission_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_assetversion_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_authorizedapplication_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_collection_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_exporttask_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_importtask_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_objectpermission_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_onetimeauthenticationkey_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_taguid_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'kpi_usercollectionsubscription_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'oauth2_provider_accesstoken_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'oauth2_provider_application_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'oauth2_provider_grant_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'oauth2_provider_refreshtoken_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'registration_registrationprofile_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'reversion_revision_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'reversion_version_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'taggit_tag_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'taggit_taggeditem_id_seq', true);
      select pglogical.replication_set_add_sequence('default', 'topology_id_seq', true);
      
    8. is it working?

      koboform=# \c kobotoolbox
      You are now connected to database "kobotoolbox" as user "kobo".
      kobotoolbox=# select last_value from auth_user_id_seq ;
       last_value 
      ------------
           148513
      (1 row)
      
      kobotoolbox=# \c koboform
      You are now connected to database "koboform" as user "kobo".
      koboform=# select last_value from auth_user_id_seq ;
       last_value 
      ------------
           149512
      (1 row)
      
  4. later, when the replication has settled, stop the django apps
  5. double-check that the tables in the new database are up to date
  6. change the KPI configuration to use the new database
  7. restart the django apps

Ideas for later about DB cleaning / EBS shrinking

  1. a good deal later, when we want to upgrade postgres:
    1. attach a new EBS volume to the EC2 instance
    2. start a new postgres 12 server that uses the new volume for storage
    3. use pglogical to replicate the in-use, postgres 9.5 server to the new postgres 12 server
    4. stop django apps, verify synchronization, cut over configuration, etc.
    5. consider removing old postgres files from the old EBS data volume and shrinking that volume
      • shrinking requires snapshotting and restoring (downtime!)
      • maybe this can be minimized by moving mongo to the new EBS volume so that there's less data to copy
jnm commented
  • stopped django apps

  • compared the count(*) and max(id) for all replicated tables (no differences found)
    it would've been better to use the array-iteration approach in


    but i didn't:

      #!/bin/bash -x
    
      DB=$1
    
      psql -U kobo $DB -c "select count(*), max(id) from spatial_ref_sys;"
      psql -U kobo $DB -c "select count(*), max(id) from django_migrations;"
      psql -U kobo $DB -c "select count(*), max(id) from django_content_type;"
      psql -U kobo $DB -c "select count(*), max(id) from auth_user;"
      psql -U kobo $DB -c "select count(*), max(id) from auth_group;"
      psql -U kobo $DB -c "select count(*), max(id) from auth_permission;"
      <SNIP>
    
  • in koboform (new database) dropped subscription select pglogical.drop_subscription(subscription_name := 'kpi_db_subscription');

  • in kobotoolbox (old database) renamed kpi_asset table so that any misconfiguration would cause an obvious failure; alter table kpi_asset rename to kpi_asset_old

  • Oli reconfigured KPI to use the new koboform database

  • restarted django apps; tested:

    • submit to existing form
    • deploy new form and submit
    • make new account, deploy new form, submit, enable "require auth...", submit again