Split the HHI production Postgres database in two
jnm opened this issue · 1 comments
To take advantage of the work in #230, we need to actually split a large Postgres database that's currently shared by KPI and KoBoCAT into two databases (one for each of the two Django projects).
- ✔️ expand the EBS volume greatly, let's say by 300G (since the total postgres disk use is 372G and we can subtract off
logger_instance
's 136 G, leaving approx. 236 G in use by the other tables) 💰 this increases our AWS bill by $30/month - ✔️ make a new database for KPI (
koboform
)- don't forget to
create extension postgis
andpostgis_topology
- start a one-off KPI container with
docker-compose -f docker-compose.frontend.yml -f docker-compose.frontend.override.yml run --rm kpi bash
- change the environment in that container so that KPI uses the new database:
root@kpi:/srv/src/kpi# DATABASE_URL=$(echo "$DATABASE_URL" | sed 's/kobotoolbox$/koboform/') root@kpi:/srv/src/kpi# KPI_DATABASE_URL="$DATABASE_URL" root@kpi:/srv/src/kpi# env | grep DATA KC_DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/kobotoolbox KPI_DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/koboform DATABASE_URL=postgis://***@postgres.kobotoolbox.internal:5432/koboform
- run
./manage.py migrate
to set up the KPI schema in the new, empty database
- don't forget to
- use pglogical
or mimeoto replicate the necessary tables from the old database to the new database-
✔️ add the
https://dl.2ndquadrant.com/default/release/apt
repository;apt-get install postgresql-9.5-pglogical
-
follow quick setup instructions, which require restarting
⚠️ the postgres server and applying a patch 💣 tokobo-docker
:diff --git a/postgres/shared/init_02_set_postgres_config.sh b/postgres/shared/init_02_set_postgres_config.sh index b8fc1bb..9a0dde6 100644 --- a/postgres/shared/init_02_set_postgres_config.sh +++ b/postgres/shared/init_02_set_postgres_config.sh @@ -1,6 +1,12 @@ #!/usr/bin/env bash - +# make sure pglogical is installed each time the container starts +test -e /usr/lib/postgresql/9.5/lib/pglogical.so || ( + apt-get update + apt-get install -y curl + curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash + apt-get install -y postgresql-9.5-pglogical +) if [ ! -f "$POSTGRES_CONFIG_FILE.orig" ]; then echo "Let's keep a copy of current configuration file!" @@ -24,4 +30,4 @@ echo "Applying new client authentication configuration file..." cp $KOBO_DOCKER_SCRIPTS_DIR/shared/pg_hba.conf "$POSTGRES_CLIENT_AUTH_FILE" echo "Creating hg_hba config file..." -sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE" \ No newline at end of file +sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE" diff --git a/postgres/shared/pg_hba.conf b/postgres/shared/pg_hba.conf index af2cea2..62bf860 100644 --- a/postgres/shared/pg_hba.conf +++ b/postgres/shared/pg_hba.conf @@ -8,9 +8,9 @@ host all all 127.0.0.1/32 trust host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. -#local replication postgres trust -#host replication postgres 127.0.0.1/32 trust -#host replication postgres ::1/128 trust +local replication postgres trust +host replication postgres 127.0.0.1/32 trust +host replication postgres ::1/128 trust host all all all md5 host replication KOBO_POSTGRES_REPLICATION_USER all md5 diff --git a/postgres/shared/postgres.conf b/postgres/shared/postgres.conf index 5d2a363..dd08d74 100644 --- a/postgres/shared/postgres.conf +++ b/postgres/shared/postgres.conf @@ -1,12 +1,18 @@ #------------------------------------------------------------------------------------ # REPLICATION #------------------------------------------------------------------------------------ -wal_level = hot_standby - -max_wal_senders = 2 -wal_keep_segments = 256 - -hot_standby = on +# wal_level = hot_standby +# +# max_wal_senders = 2 +# wal_keep_segments = 256 +# +# hot_standby = on +wal_level = 'logical' +max_worker_processes = 10 # one per database needed on provider node + # one per node needed on subscriber node +max_replication_slots = 10 # one per node needed on provider node +max_wal_senders = 10 # one per node needed on provider node +shared_preload_libraries = 'pglogical' #------------------------------------------------------------------------------------ # LOGS
-
set up replication for one table,
kpi_asset
:root@postgres:/# psql -U kobo kobotoolbox select pglogical.create_node( node_name := 'combined_db_provider', dsn := 'host=localhost port=5432 dbname=kobotoolbox' ); select pglogical.replication_set_add_table('default', 'kpi_asset'); \c koboform select pglogical.create_node( node_name := 'kpi_db_subscriber', dsn := 'host=localhost port=5432 dbname=koboform' ); select pglogical.create_subscription( subscription_name := 'kpi_db_subscription', provider_dsn := 'host=localhost port=5432 dbname=kobotoolbox' );
-
after waiting a little while, check the progress with
select count(*) kpi_asset;
in both databases andselect * from pg_stat_replication;
-
add the rest of the tables listed in https://github.com/kobotoolbox/kobo-docker/blob/kobo-install-two-databases/postgres/master/clone_data_from_kc_to_kpi.sh to the replication set:
🛑 💣 for instances running code later than the final shared-database release, more tables may be required! HHI and OCHA requirekpi_assetuserpartialpermission
in addition to the list below.select pglogical.replication_set_add_table('default', 'spatial_ref_sys'); select pglogical.replication_set_add_table('default', 'django_migrations'); select pglogical.replication_set_add_table('default', 'django_content_type'); select pglogical.replication_set_add_table('default', 'auth_user'); select pglogical.replication_set_add_table('default', 'auth_group'); select pglogical.replication_set_add_table('default', 'auth_permission'); select pglogical.replication_set_add_table('default', 'auth_group_permissions'); select pglogical.replication_set_add_table('default', 'auth_user_groups'); select pglogical.replication_set_add_table('default', 'auth_user_user_permissions'); select pglogical.replication_set_add_table('default', 'constance_config'); select pglogical.replication_set_add_table('default', 'django_celery_beat_periodictasks'); select pglogical.replication_set_add_table('default', 'django_celery_beat_crontabschedule'); select pglogical.replication_set_add_table('default', 'django_celery_beat_intervalschedule'); select pglogical.replication_set_add_table('default', 'django_celery_beat_periodictask'); select pglogical.replication_set_add_table('default', 'django_celery_beat_solarschedule'); select pglogical.replication_set_add_table('default', 'django_admin_log'); select pglogical.replication_set_add_table('default', 'authtoken_token'); select pglogical.replication_set_add_table('default', 'django_digest_partialdigest'); select pglogical.replication_set_add_table('default', 'taggit_tag'); select pglogical.replication_set_add_table('default', 'taggit_taggeditem'); select pglogical.replication_set_add_table('default', 'kpi_collection'); select pglogical.replication_set_add_table('default', 'reversion_revision'); select pglogical.replication_set_add_table('default', 'reversion_version'); select pglogical.replication_set_add_table('default', 'kpi_assetversion'); select pglogical.replication_set_add_table('default', 'kpi_importtask'); select pglogical.replication_set_add_table('default', 'kpi_authorizedapplication'); select pglogical.replication_set_add_table('default', 'kpi_taguid'); select pglogical.replication_set_add_table('default', 'kpi_objectpermission'); select pglogical.replication_set_add_table('default', 'kpi_assetsnapshot'); select pglogical.replication_set_add_table('default', 'kpi_onetimeauthenticationkey'); select pglogical.replication_set_add_table('default', 'kpi_usercollectionsubscription'); select pglogical.replication_set_add_table('default', 'kpi_exporttask'); select pglogical.replication_set_add_table('default', 'kpi_assetfile'); select pglogical.replication_set_add_table('default', 'hub_sitewidemessage'); select pglogical.replication_set_add_table('default', 'hub_configurationfile'); select pglogical.replication_set_add_table('default', 'hub_formbuilderpreference'); select pglogical.replication_set_add_table('default', 'hub_extrauserdetail'); select pglogical.replication_set_add_table('default', 'hub_perusersetting'); select pglogical.replication_set_add_table('default', 'oauth2_provider_application'); select pglogical.replication_set_add_table('default', 'django_session'); select pglogical.replication_set_add_table('default', 'oauth2_provider_accesstoken'); select pglogical.replication_set_add_table('default', 'oauth2_provider_grant'); select pglogical.replication_set_add_table('default', 'django_digest_usernonce'); select pglogical.replication_set_add_table('default', 'oauth2_provider_refreshtoken'); select pglogical.replication_set_add_table('default', 'registration_registrationprofile'); select pglogical.replication_set_add_table('default', 'hook_hook'); select pglogical.replication_set_add_table('default', 'hook_hooklog'); select pglogical.replication_set_add_table('default', 'external_integrations_corsmodel'); select pglogical.replication_set_add_table('default', 'help_inappmessage'); select pglogical.replication_set_add_table('default', 'help_inappmessagefile'); select pglogical.replication_set_add_table('default', 'help_inappmessageuserinteractions');
-
didn't work! the docs say
It's usually better to create replication sets before subscribing so that all tables are synchronized during initial replication setup in a single initial transaction. However, users of bigger databases may instead wish to create them incrementally for better control.
pglogical.alter_subscription_resynchronize_table()
is available, but it fails because the "table may not be the target of any foreign key constraints."pglogical.alter_subscription_synchronize(subscription_name := 'default', truncate := true)
likewise fails because of FK constraints. i dropped the subscription, subscriber, andkoboform
database and started again, but i still ended up having toTRUNCATE…RESTART IDENTITY CASCADE
each table since./manage.py migrate
adds lots of rows, e.g. indjango_content_type
. -
replicate sequences; in the source database, execute (list obtained from
\ds
in the destination database):select pglogical.replication_set_add_sequence('default', 'auth_group_id_seq', true); select pglogical.replication_set_add_sequence('default', 'auth_group_permissions_id_seq', true); select pglogical.replication_set_add_sequence('default', 'auth_permission_id_seq', true); select pglogical.replication_set_add_sequence('default', 'auth_user_groups_id_seq', true); select pglogical.replication_set_add_sequence('default', 'auth_user_id_seq', true); select pglogical.replication_set_add_sequence('default', 'auth_user_user_permissions_id_seq', true); select pglogical.replication_set_add_sequence('default', 'constance_config_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_admin_log_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_celery_beat_crontabschedule_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_celery_beat_intervalschedule_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_celery_beat_periodictask_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_celery_beat_solarschedule_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_content_type_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_digest_partialdigest_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_digest_usernonce_id_seq', true); select pglogical.replication_set_add_sequence('default', 'django_migrations_id_seq', true); select pglogical.replication_set_add_sequence('default', 'external_integrations_corsmodel_id_seq', true); select pglogical.replication_set_add_sequence('default', 'help_inappmessage_id_seq', true); select pglogical.replication_set_add_sequence('default', 'help_inappmessagefile_id_seq', true); select pglogical.replication_set_add_sequence('default', 'help_inappmessageuserinteractions_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hook_hook_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hook_hooklog_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hub_configurationfile_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hub_extrauserdetail_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hub_formbuilderpreference_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hub_perusersetting_id_seq', true); select pglogical.replication_set_add_sequence('default', 'hub_sitewidemessage_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_asset_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_assetfile_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_assetsnapshot_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_assetuserpartialpermission_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_assetversion_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_authorizedapplication_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_collection_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_exporttask_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_importtask_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_objectpermission_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_onetimeauthenticationkey_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_taguid_id_seq', true); select pglogical.replication_set_add_sequence('default', 'kpi_usercollectionsubscription_id_seq', true); select pglogical.replication_set_add_sequence('default', 'oauth2_provider_accesstoken_id_seq', true); select pglogical.replication_set_add_sequence('default', 'oauth2_provider_application_id_seq', true); select pglogical.replication_set_add_sequence('default', 'oauth2_provider_grant_id_seq', true); select pglogical.replication_set_add_sequence('default', 'oauth2_provider_refreshtoken_id_seq', true); select pglogical.replication_set_add_sequence('default', 'registration_registrationprofile_id_seq', true); select pglogical.replication_set_add_sequence('default', 'reversion_revision_id_seq', true); select pglogical.replication_set_add_sequence('default', 'reversion_version_id_seq', true); select pglogical.replication_set_add_sequence('default', 'taggit_tag_id_seq', true); select pglogical.replication_set_add_sequence('default', 'taggit_taggeditem_id_seq', true); select pglogical.replication_set_add_sequence('default', 'topology_id_seq', true);
-
is it working?
koboform=# \c kobotoolbox You are now connected to database "kobotoolbox" as user "kobo". kobotoolbox=# select last_value from auth_user_id_seq ; last_value ------------ 148513 (1 row) kobotoolbox=# \c koboform You are now connected to database "koboform" as user "kobo". koboform=# select last_value from auth_user_id_seq ; last_value ------------ 149512 (1 row)
-
- later, when the replication has settled, stop the django apps
- double-check that the tables in the new database are up to date
- go by counts of rows in each table, max PK, and matching sequences?
- probably have to run
pglogical.synchronize_sequence()
- refer to other tables in https://github.com/2ndQuadrant/pglogical/blob/21908f4907a19ba005400ceb2b6de731baff6810/pglogical--2.2.2.sql#L82
- change the KPI configuration to use the new database
- restart the django apps
Ideas for later about DB cleaning / EBS shrinking
- a good deal later, when we want to upgrade postgres:
- attach a new EBS volume to the EC2 instance
- start a new postgres 12 server that uses the new volume for storage
- use pglogical to replicate the in-use, postgres 9.5 server to the new postgres 12 server
- stop django apps, verify synchronization, cut over configuration, etc.
- consider removing old postgres files from the old EBS data volume and shrinking that volume
- shrinking requires snapshotting and restoring (downtime!)
- maybe this can be minimized by moving mongo to the new EBS volume so that there's less data to copy
-
stopped django apps
-
compared the
count(*)
andmax(id)
for all replicated tables (no differences found)
it would've been better to use the array-iteration approach in
but i didn't:#!/bin/bash -x DB=$1 psql -U kobo $DB -c "select count(*), max(id) from spatial_ref_sys;" psql -U kobo $DB -c "select count(*), max(id) from django_migrations;" psql -U kobo $DB -c "select count(*), max(id) from django_content_type;" psql -U kobo $DB -c "select count(*), max(id) from auth_user;" psql -U kobo $DB -c "select count(*), max(id) from auth_group;" psql -U kobo $DB -c "select count(*), max(id) from auth_permission;" <SNIP>
-
in
koboform
(new database) dropped subscriptionselect pglogical.drop_subscription(subscription_name := 'kpi_db_subscription');
-
in
kobotoolbox
(old database) renamedkpi_asset
table so that any misconfiguration would cause an obvious failure;alter table kpi_asset rename to kpi_asset_old
-
Oli reconfigured KPI to use the new
koboform
database -
restarted django apps; tested:
- submit to existing form
- deploy new form and submit
- make new account, deploy new form, submit, enable "require auth...", submit again