Split the OCHA production database in two
jnm opened this issue · 28 comments
Now that we've gotten through #259, it's time for large-and-in-charge OCHA.
Preparing the in-use database
- Installed pglogical and updated Postgres configuration:
ubuntu@ip-10-0-1-223:~/kobo-docker⟫ pwd /home/ubuntu/kobo-docker ubuntu@ip-10-0-1-223:~/kobo-docker⟫ git diff diff --git a/postgres/shared/init_02_set_postgres_config.sh b/postgres/shared/init_02_set_postgres_config.sh index b8fc1bb..9a0dde6 100644 --- a/postgres/shared/init_02_set_postgres_config.sh +++ b/postgres/shared/init_02_set_postgres_config.sh @@ -1,6 +1,12 @@ #!/usr/bin/env bash - +# make sure pglogical is installed each time the container starts +test -e /usr/lib/postgresql/9.5/lib/pglogical.so || ( + apt-get update + apt-get install -y curl + curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash + apt-get install -y postgresql-9.5-pglogical +) if [ ! -f "$POSTGRES_CONFIG_FILE.orig" ]; then echo "Let's keep a copy of current configuration file!" @@ -24,4 +30,4 @@ echo "Applying new client authentication configuration file..." cp $KOBO_DOCKER_SCRIPTS_DIR/shared/pg_hba.conf "$POSTGRES_CLIENT_AUTH_FILE" echo "Creating hg_hba config file..." -sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE" \ No newline at end of file +sed -i "s/KOBO_POSTGRES_REPLICATION_USER/${KOBO_POSTGRES_REPLICATION_USER//\"/}/g" "$POSTGRES_CLIENT_AUTH_FILE" diff --git a/postgres/shared/pg_hba.conf b/postgres/shared/pg_hba.conf index af2cea2..48bf35e 100644 --- a/postgres/shared/pg_hba.conf +++ b/postgres/shared/pg_hba.conf @@ -14,3 +14,5 @@ host all all ::1/128 trust host all all all md5 host replication KOBO_POSTGRES_REPLICATION_USER all md5 +# temporary for two-database split; the superuser has to be able to replicate +host replication kobo all md5 diff --git a/postgres/shared/postgres.conf b/postgres/shared/postgres.conf index 89fe222..0337ab3 100644 --- a/postgres/shared/postgres.conf +++ b/postgres/shared/postgres.conf @@ -1,13 +1,24 @@ #------------------------------------------------------------------------------------ # REPLICATION #------------------------------------------------------------------------------------ -wal_level = hot_standby +#wal_level = hot_standby -max_wal_senders = 2 +#max_wal_senders = 2 wal_keep_segments = 512 hot_standby = on +### UNCOMMENT ABOVE AND REMOVE BELOW TO RETURN TO NORMALCY ### + +wal_level = 'logical' +max_worker_processes = 10 # one per database needed on provider node + # one per node needed on subscriber node +max_replication_slots = 10 # one per node needed on provider node +max_wal_senders = 10 # one per node needed on provider node +shared_preload_libraries = 'pglogical' + +##################### END OF NON-NORMAL STUFF ################ + #------------------------------------------------------------------------------------ # LOGS #------------------------------------------------------------------------------------
- Stopped in-use Postgres server with
kill -INT
per https://www.postgresql.org/docs/9.5/server-shutdown.html - Brought it back up:
ubuntu@ip-10-0-1-223:~/kobo-docker⟫ docker-compose -f docker-compose.backend.master.yml -f docker-compose.backend.master.override.yml up -d postgres
- Checked https://kobo.humanitarianresponse.info/service_health/; OK
⚠️ The reconfiguration broke the secondary; the container is in a restart loop. Problem:hot standby is not possible because max_worker_processes = 8 is a lower setting than on the master server (its value was 10)
- Changing this setting on the master didn't help; do the WALs contain the value at the time of their creation?
- Upping the setting on the secondary worked (according to
select * from pg_stat_replication
on the master)
- What gives? We set this to
10
:postgres=# select name, setting, sourcefile from pg_settings where name='max_wal_senders'; name | setting | sourcefile -----------------+---------+----------------------------------------------- max_wal_senders | 2 | /var/lib/postgresql/data/postgresql.auto.conf (1 row)
- Apparently
/var/lib/postgresql/data/postgresql.auto.conf
is created byALTER SYSTEM
? The file was last modified in November of 2018😒 - Trashed it!
max_wal_senders
was the only setting there. Now the value is10
as expected (and I figure we might need at least 3)
- Apparently
- Created the pglogical node and replication set on the in-use database:
root@postgres:/# psql -U kobo kobotoolbox psql (9.5.16) Type "help" for help. kobotoolbox=# create extension pglogical; CREATE EXTENSION kobotoolbox=# select pglogical.create_node( kobotoolbox(# node_name := 'combined_db_provider', kobotoolbox(# dsn := 'host=localhost port=5432 dbname=kobotoolbox' kobotoolbox(# ); create_node ------------- 315837889 (1 row) kobotoolbox=# select pglogical.create_replication_set('kpi', true, true, true, true); create_replication_set ------------------------ 483997336 (1 row)
- Added all the KPI tables to the replication set:
select pglogical.replication_set_add_table('kpi', 'spatial_ref_sys'); select pglogical.replication_set_add_table('kpi', 'django_migrations'); select pglogical.replication_set_add_table('kpi', 'django_content_type'); select pglogical.replication_set_add_table('kpi', 'auth_user'); select pglogical.replication_set_add_table('kpi', 'auth_group'); select pglogical.replication_set_add_table('kpi', 'auth_permission'); select pglogical.replication_set_add_table('kpi', 'auth_group_permissions'); select pglogical.replication_set_add_table('kpi', 'auth_user_groups'); select pglogical.replication_set_add_table('kpi', 'auth_user_user_permissions'); select pglogical.replication_set_add_table('kpi', 'constance_config'); select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictasks'); select pglogical.replication_set_add_table('kpi', 'django_celery_beat_crontabschedule'); select pglogical.replication_set_add_table('kpi', 'django_celery_beat_intervalschedule'); select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictask'); select pglogical.replication_set_add_table('kpi', 'django_celery_beat_solarschedule'); select pglogical.replication_set_add_table('kpi', 'django_admin_log'); select pglogical.replication_set_add_table('kpi', 'authtoken_token'); select pglogical.replication_set_add_table('kpi', 'django_digest_partialdigest'); select pglogical.replication_set_add_table('kpi', 'taggit_tag'); select pglogical.replication_set_add_table('kpi', 'taggit_taggeditem'); select pglogical.replication_set_add_table('kpi', 'kpi_collection'); select pglogical.replication_set_add_table('kpi', 'kpi_asset'); select pglogical.replication_set_add_table('kpi', 'reversion_revision'); select pglogical.replication_set_add_table('kpi', 'reversion_version'); select pglogical.replication_set_add_table('kpi', 'kpi_assetversion'); select pglogical.replication_set_add_table('kpi', 'kpi_importtask'); select pglogical.replication_set_add_table('kpi', 'kpi_authorizedapplication'); select pglogical.replication_set_add_table('kpi', 'kpi_taguid'); select pglogical.replication_set_add_table('kpi', 'kpi_objectpermission'); select pglogical.replication_set_add_table('kpi', 'kpi_assetsnapshot'); select pglogical.replication_set_add_table('kpi', 'kpi_onetimeauthenticationkey'); select pglogical.replication_set_add_table('kpi', 'kpi_usercollectionsubscription'); select pglogical.replication_set_add_table('kpi', 'kpi_exporttask'); select pglogical.replication_set_add_table('kpi', 'kpi_assetfile'); select pglogical.replication_set_add_table('kpi', 'hub_sitewidemessage'); select pglogical.replication_set_add_table('kpi', 'hub_configurationfile'); select pglogical.replication_set_add_table('kpi', 'hub_formbuilderpreference'); select pglogical.replication_set_add_table('kpi', 'hub_extrauserdetail'); select pglogical.replication_set_add_table('kpi', 'hub_perusersetting'); select pglogical.replication_set_add_table('kpi', 'oauth2_provider_application'); select pglogical.replication_set_add_table('kpi', 'django_session'); select pglogical.replication_set_add_table('kpi', 'oauth2_provider_accesstoken'); select pglogical.replication_set_add_table('kpi', 'oauth2_provider_grant'); select pglogical.replication_set_add_table('kpi', 'django_digest_usernonce'); select pglogical.replication_set_add_table('kpi', 'oauth2_provider_refreshtoken'); select pglogical.replication_set_add_table('kpi', 'registration_registrationprofile'); select pglogical.replication_set_add_table('kpi', 'hook_hook'); select pglogical.replication_set_add_table('kpi', 'hook_hooklog'); select pglogical.replication_set_add_table('kpi', 'external_integrations_corsmodel'); select pglogical.replication_set_add_table('kpi', 'help_inappmessage'); select pglogical.replication_set_add_table('kpi', 'help_inappmessagefile'); select pglogical.replication_set_add_table('kpi', 'help_inappmessageuserinteractions'); select pglogical.replication_set_add_table('kpi', 'kpi_assetuserpartialpermission');
- Added all the sequences:
select pglogical.replication_set_add_sequence('kpi', 'auth_group_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'auth_group_permissions_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'auth_permission_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'auth_user_groups_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'auth_user_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'auth_user_user_permissions_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'constance_config_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_admin_log_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_crontabschedule_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_intervalschedule_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_periodictask_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_celery_beat_solarschedule_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_content_type_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_digest_partialdigest_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_digest_usernonce_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'django_migrations_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'external_integrations_corsmodel_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'help_inappmessage_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'help_inappmessagefile_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'help_inappmessageuserinteractions_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hook_hook_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hook_hooklog_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hub_configurationfile_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hub_extrauserdetail_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hub_formbuilderpreference_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hub_perusersetting_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'hub_sitewidemessage_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_asset_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_assetfile_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_assetsnapshot_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_assetuserpartialpermission_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_assetversion_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_authorizedapplication_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_collection_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_exporttask_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_importtask_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_objectpermission_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_onetimeauthenticationkey_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_taguid_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'kpi_usercollectionsubscription_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_accesstoken_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_application_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_grant_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'oauth2_provider_refreshtoken_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'registration_registrationprofile_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'reversion_revision_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'reversion_version_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'taggit_tag_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'taggit_taggeditem_id_seq', true); select pglogical.replication_set_add_sequence('kpi', 'topology_id_seq', true);
Prepare the new database
- Made new 1666 GiB EBS volume (old one was 1333)
- Attached to EC2 instance
mkfs.ext4
, mounted to/mnt/new
- Wrote a minimal Dockerfile that includes pglogical:
FROM mdillon/postgis:9.5 RUN apt-get update \ && apt-get install -y curl \ && (curl https://access.2ndquadrant.com/api/repository/dl/default/release/deb | bash) \ && apt-get install -y postgresql-9.5-pglogical
- Wrote a minimal Docker Compose file:
version: '2.2' services: postgres: volumes: - ./dbvol:/var/lib/postgresql/data hostname: newpg build: .
docker-compose up -d
;docker-compose exec postgres bash
- Configured for pglogical:
root@newpg:/# psql -U postgres psql (9.5.16) Type "help" for help. postgres=# alter system set wal_level to 'logical'; ALTER SYSTEM postgres=# alter system set max_worker_processes to 10; ALTER SYSTEM postgres=# alter system set max_replication_slots to 10; ALTER SYSTEM postgres=# alter system set max_wal_senders to 10; ALTER SYSTEM postgres=# alter system set shared_preload_libraries to 'pglogical'; ALTER SYSTEM
docker-compose restart
for the configuration to take effect- Created the new database with necessary extensions, and created the role
kobo
to save some work later;postgres=# create database koboform; CREATE DATABASE postgres=# \c koboform You are now connected to database "koboform" as user "postgres". koboform=# create extension postgis; CREATE EXTENSION koboform=# create extension postgis_topology; CREATE EXTENSION koboform=# create extension pglogical; CREATE EXTENSION koboform=# create role kobo; CREATE ROLE
- Since I didn't map any ports for KPI to connect to my new database and create its tables, I cheated by getting the KPI schema from HHI PROD with
pg_dump --schema-only --username=kobo --dbname=koboform | bzip2 | base64
and loading it into my new database withbase64 -d | bunzip2 | psql -U postgres koboform
- Let's try some replication:
root@newpg:/# psql -U postgres koboform psql (9.5.16) Type "help" for help. koboform=# select pglogical.create_node( koboform(# node_name := 'kpi_db_subscriber', koboform(# dsn := 'host=localhost port=5432 dbname=koboform' koboform(# ); create_node ------------- 1580755673 (1 row) koboform=# select pglogical.create_subscription( koboform(# subscription_name := 'kpi_db_subscription', koboform(# replication_sets := '{kpi}', koboform(# provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem' koboform(# ); create_subscription --------------------- 3132487805 (1 row)
df -h /mnt/new
shows the data pouring in.docker-compose logs -f
has only this repeating complaint, which I think is okay given the absurdly large amount of writing the replication process is doing:LOG: checkpoints are occurring too frequently (24 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size".
😴
Failed:
postgres_1 | ERROR: duplicate key value violates unique constraint "taggit_tag_name_key"
postgres_1 | DETAIL: Key (name)=(1) already exists.
postgres_1 | CONTEXT: COPY taggit_tag, line 419
postgres_1 | STATEMENT: COPY "public"."taggit_tag" ("id","name","slug") FROM stdin
postgres_1 | ERROR: current transaction is aborted, commands ignored until end of transaction block
postgres_1 | STATEMENT: COPY "public"."kpi_importtask" ("id","data","status","uid","date_created","user_id","messages") FROM stdin
postgres_1 | ERROR: table copy failed
postgres_1 | DETAIL: Query 'COPY "public"."kpi_importtask" ("id","data","status","uid","date_created","user_id","messages") FROM stdin':
postgres_1 | LOG: apply worker [92] at slot 2 generation 1 exiting with error
postgres_1 | LOG: worker process: pglogical apply 20623:3132487805 (PID 92) exited with exit code 1
postgres_1 | LOG: starting apply for subscription kpi_db_subscription
postgres_1 | ERROR: subscriber kpi_db_subscription initialization failed during nonrecoverable step (d), please try the setup again
Let's follow 2ndQuadrant/pglogical#156 (comment) and synchronize the tables individually. On the source, we'll create a new provider node and empty replication set:
select pglogical.create_node(
node_name := 'combined_db_provider',
dsn := 'host=localhost port=5432 dbname=kobotoolbox'
);
select pglogical.create_replication_set('kpi', true, true, true, true);
On the destination, we'll create a new subscriber node and subscription:
select pglogical.create_node(
node_name := 'kpi_db_subscriber',
dsn := 'host=localhost port=5432 dbname=koboform'
);
select pglogical.create_subscription(
subscription_name := 'kpi_db_subscription',
replication_sets := '{kpi}',
provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem'
);
Then, on the source again, we'll add the tables individually (or in small groups)—using synchronize_data := true
—and watch the synchronization status:
select pglogical.replication_set_add_table('kpi', 'spatial_ref_sys', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_migrations', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_content_type', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_group', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_permission', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_group_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user_groups', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'auth_user_user_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'constance_config', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictasks', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_crontabschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_intervalschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_periodictask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_celery_beat_solarschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_admin_log', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'authtoken_token', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_digest_partialdigest', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'taggit_tag', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'taggit_taggeditem', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_collection', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_asset', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'reversion_revision', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'reversion_version', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetversion', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_importtask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_authorizedapplication', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_taguid', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_objectpermission', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetsnapshot', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_onetimeauthenticationkey', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_usercollectionsubscription', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_exporttask', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'kpi_assetfile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_sitewidemessage', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_configurationfile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_formbuilderpreference', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_extrauserdetail', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hub_perusersetting', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_application', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_session', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_accesstoken', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_grant', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'django_digest_usernonce', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'oauth2_provider_refreshtoken', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'registration_registrationprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hook_hook', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'hook_hooklog', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'external_integrations_corsmodel', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessage', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessagefile', synchronize_data := true);
select pglogical.replication_set_add_table('kpi', 'help_inappmessageuserinteractions', synchronize_data := true);
Checking the status on the source database, kpi_db_subscription_copy
in the startup
state indicates the initial copy is still ongoing:
kobotoolbox=# \x
Expanded display is on.
kobotoolbox=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 106
usesysid | 136335772
usename | kobo_replication
application_name | walreceiver
client_addr | 10.0.1.251
client_hostname |
client_port | 44422
backend_start | 2020-01-14 07:19:20.40829+00
backend_xmin | 461708255
state | streaming
sent_location | AF7/EF66B430
write_location | AF7/EF66B430
flush_location | AF7/EF66B430
replay_location | AF7/EF66B398
sync_priority | 0
sync_state | async
-[ RECORD 2 ]----+------------------------------
pid | 29988
usesysid | 16384
usename | kobo
application_name | kpi_db_subscription
client_addr | 172.18.0.1
client_hostname |
client_port | 56304
backend_start | 2020-01-15 01:46:32.440657+00
backend_xmin |
state | streaming
sent_location | AF7/EF66B430
write_location | AF7/EF66B430
flush_location | AF7/EF66B430
replay_location | AF7/EF66B430
sync_priority | 0
sync_state | async
-[ RECORD 3 ]----+------------------------------
pid | 11636
usesysid | 16384
usename | kobo
application_name | kpi_db_subscription_copy
client_addr | 172.18.0.1
client_hostname |
client_port | 56874
backend_start | 2020-01-15 02:17:15.151641+00
backend_xmin | 461712515
state | startup
sent_location |
write_location |
flush_location |
replay_location |
sync_priority | 0
sync_state | async
On the destination, we want all the sync_status
es to be r
:
koboform=# select * from pglogical.local_sync_status ;
sync_kind | sync_subid | sync_nspname | sync_relname | sync_status | sync_statuslsn
-----------+------------+--------------+-------------------------------------+-------------+----------------
d | 3132487805 | public | auth_group | r | AF7/CEF8F720
d | 3132487805 | | | r | 0/0
d | 3132487805 | public | auth_user_groups | r | AF7/CEFEF118
d | 3132487805 | public | spatial_ref_sys | r | AF7/CECCD4B8
d | 3132487805 | public | auth_permission | r | AF7/CEF932F8
d | 3132487805 | public | django_celery_beat_periodictask | r | AF7/CF43CAA0
d | 3132487805 | public | django_migrations | r | AF7/CED39E20
d | 3132487805 | public | kpi_collection | r | AF7/D1EF4AD8
d | 3132487805 | public | authtoken_token | r | AF7/CEFAD048
d | 3132487805 | public | auth_user_user_permissions | r | AF7/CF3A9628
d | 3132487805 | public | django_content_type | r | AF7/CED4D758
d | 3132487805 | public | constance_config | r | AF7/CF3CF9F8
d | 3132487805 | public | django_digest_partialdigest | r | AF7/CEFEEED8
d | 3132487805 | public | auth_user | r | AF7/CED881F8
d | 3132487805 | public | django_celery_beat_periodictasks | r | AF7/CF3F5698
d | 3132487805 | public | django_celery_beat_solarschedule | r | AF7/CF43D448
d | 3132487805 | public | auth_group_permissions | r | AF7/CEFEEFF8
d | 3132487805 | public | django_celery_beat_crontabschedule | r | AF7/CF401B28
d | 3132487805 | public | reversion_version | d | AF7/D3397DB8
d | 3132487805 | public | django_admin_log | r | AF7/CF43FF80
d | 3132487805 | public | taggit_tag | r | AF7/CF3ABAD0
d | 3132487805 | public | reversion_revision | r | AF7/D3397D80
d | 3132487805 | public | kpi_asset | i | 0/0
d | 3132487805 | public | django_celery_beat_intervalschedule | r | AF7/CF409538
d | 3132487805 | public | taggit_taggeditem | r | AF7/D1E92790
(25 rows)
We can get an idea of how the initial copy is progressing by using pgstattuple()
, which tells us the number of (uncommitted) rows in the table even when count(*)
returns zero:
koboform=# select count(*) from kpi_assetversion;
count
-------
0
(1 row)
koboform=# select tuple_count from pgstattuple('kpi_assetversion');
tuple_count
-------------
1229244
(1 row)
Success. Unsure why the taggit_tag
unique constraint violation didn't trip us up again.
taggit_tag
did fail and is empty
koboform=# select * from pglogical.local_sync_status ;
sync_kind | sync_subid | sync_nspname | sync_relname | sync_status | sync_statuslsn
-----------+------------+--------------+-------------------------------------+-------------+----------------
d | 3132487805 | public | auth_group | r | AF7/CEF8F720
d | 3132487805 | | | r | 0/0
d | 3132487805 | public | kpi_asset | r | AF7/FFE4B3C8
d | 3132487805 | public | auth_user_groups | r | AF7/CEFEF118
d | 3132487805 | public | spatial_ref_sys | r | AF7/CECCD4B8
d | 3132487805 | public | auth_permission | r | AF7/CEF932F8
d | 3132487805 | public | django_celery_beat_periodictask | r | AF7/CF43CAA0
d | 3132487805 | public | django_migrations | r | AF7/CED39E20
d | 3132487805 | public | kpi_collection | r | AF7/D1EF4AD8
d | 3132487805 | public | authtoken_token | r | AF7/CEFAD048
d | 3132487805 | public | auth_user_user_permissions | r | AF7/CF3A9628
d | 3132487805 | public | django_content_type | r | AF7/CED4D758
d | 3132487805 | public | constance_config | r | AF7/CF3CF9F8
d | 3132487805 | public | django_digest_partialdigest | r | AF7/CEFEEED8
d | 3132487805 | public | auth_user | r | AF7/CED881F8
d | 3132487805 | public | django_celery_beat_periodictasks | r | AF7/CF3F5698
d | 3132487805 | public | django_celery_beat_solarschedule | r | AF7/CF43D448
d | 3132487805 | public | kpi_usercollectionsubscription | r | AFA/97C08B90
d | 3132487805 | public | kpi_authorizedapplication | r | AFA/95C9AC70
d | 3132487805 | public | auth_group_permissions | r | AF7/CEFEEFF8
d | 3132487805 | public | django_celery_beat_crontabschedule | r | AF7/CF401B28
d | 3132487805 | public | django_admin_log | r | AF7/CF43FF80
d | 3132487805 | public | taggit_tag | r | AF7/CF3ABAD0
d | 3132487805 | public | reversion_revision | r | AF7/D3397D80
d | 3132487805 | public | django_celery_beat_intervalschedule | r | AF7/CF409538
d | 3132487805 | public | kpi_assetversion | r | AFA/76308F80
d | 3132487805 | public | taggit_taggeditem | r | AF7/D1E92790
d | 3132487805 | public | kpi_objectpermission | r | AFA/97BE5988
d | 3132487805 | public | reversion_version | r | AF7/F6B580E8
d | 3132487805 | public | kpi_onetimeauthenticationkey | r | AFA/A41DEC20
d | 3132487805 | public | kpi_importtask | r | AFA/91972DB8
d | 3132487805 | public | kpi_taguid | r | AFA/96414898
d | 3132487805 | public | kpi_exporttask | r | AFA/AD144F08
d | 3132487805 | public | kpi_assetsnapshot | r | AFA/A41D5380
d | 3132487805 | public | hub_configurationfile | r | AFA/AD17C1B8
d | 3132487805 | public | hub_formbuilderpreference | r | AFA/AD17C1F0
d | 3132487805 | public | hub_extrauserdetail | r | AFA/AD1B20C8
d | 3132487805 | public | hub_perusersetting | r | AFA/AD1C9900
d | 3132487805 | public | kpi_assetfile | r | AFA/AD1CEC90
d | 3132487805 | public | hub_sitewidemessage | r | AFA/AD1E7DD0
d | 3132487805 | public | oauth2_provider_application | r | AFA/AD1E7E30
d | 3132487805 | public | oauth2_provider_accesstoken | r | AFA/AD6C8248
d | 3132487805 | public | oauth2_provider_grant | r | AFA/AD6D3108
d | 3132487805 | public | django_digest_usernonce | r | AFA/AD6DC950
d | 3132487805 | public | help_inappmessage | r | AFA/AEFBCB98
d | 3132487805 | public | django_session | r | AFA/AD6ADBE8
d | 3132487805 | public | help_inappmessagefile | r | AFA/AEFBE960
d | 3132487805 | public | oauth2_provider_refreshtoken | r | AFA/AD6DF0B8
d | 3132487805 | public | registration_registrationprofile | r | AFA/AD725760
d | 3132487805 | public | help_inappmessageuserinteractions | r | AFA/AEFC1C18
d | 3132487805 | public | hook_hook | r | AFA/AEEA8F60
d | 3132487805 | public | hook_hooklog | r | AFA/AEF5DF18
d | 3132487805 | public | external_integrations_corsmodel | r | AFA/AEFB5B00
(53 rows)
Added the KPI sequences per above.
Let's try KoBoCAT.
Source
select pglogical.create_replication_set('kobocat', true, true, true, true);
Destination
select pglogical.create_node(
node_name := 'kobocat_db_subscriber',
dsn := 'host=localhost port=5432 dbname=kobocat'
);
select pglogical.create_subscription(
subscription_name := 'kobocat_db_subscription',
replication_sets := '{kobocat}',
provider_dsn := 'host=172.17.0.1 port=5432 dbname=kobotoolbox user=kobo password=ahem'
);
Source
(ordered to avoid violating FK constraints)
select pglogical.replication_set_add_table('kobocat', 'django_migrations', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_group', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_permission', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user_user_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_group_permissions', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'auth_user_groups', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'authtoken_token', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'corsheaders_corsmodel', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_admin_log', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_crontabschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_intervalschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_periodictask', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_periodictasks', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_celery_beat_solarschedule', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_content_type', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_digest_partialdigest', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_digest_usernonce', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_session', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'django_site', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_accesstoken', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_application', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_grant', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'oauth2_provider_refreshtoken', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'registration_registrationprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'guardian_groupobjectpermission', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'guardian_userobjectpermission', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_organizationprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_project', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_project_user_stars', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_team', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_team_projects', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'taggit_tag', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'taggit_taggeditem', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_xform', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'stats_statscount', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_columnrename', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_export', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'api_projectxform', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_surveytype', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_metadata', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_userprofile', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'main_tokenstoragemodel', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'restservice_restservice', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_instance', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_instancehistory', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_attachment', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_note', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'logger_ziggyinstance', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_instancemodification', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'viewer_parsedinstance', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'reversion_revision', synchronize_data := true);
select pglogical.replication_set_add_table('kobocat', 'reversion_version', synchronize_data := true);
Need to compare row counts (and maximum PKs?) for all tables.
Plus, there are already some errors to investigate:
ERROR: duplicate key value violates unique constraint "taggit_tag_name_key"
ERROR: duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
ERROR: duplicate key value violates unique constraint "logger_surveytype_slug_key"
Taking taggit_tag
as an example, we can see that tags added since we configured pglogical are replicated, but none of the tags before that were copied over:
root@newpg:/# psql -U postgres -c 'select * from taggit_tag' kobocat
id | name | slug
------+--------+------
2983 | សាលា | _22
2985 | សៀមរាប | _23
(2 rows)
Let's fix that by copying the old data, using --inserts
to give each row its own statement since we know in advance that some rows will fail:
pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat
The result is a lot of INSERT 0 1
(indicating success), but also a lot of failures, e.g.
ERROR: duplicate key value violates unique constraint "taggit_tag_name_key"
DETAIL: Key (name)=(demographic) already exists.
Unfortunately, some of these duplicates that couldn't be added are actually referenced by tagged items:
kobocat=# select count(distinct tag_id) from taggit_taggeditem where tag_id not in (select id from taggit_tag as _);
count
-------
247
(1 row)
Our job is to split a database, not clean up inconsistency. Let's drop the taggit_tag
constraints and the FK from taggit_taggeditem
so that we can truncate taggit_tag
and start over:
kobocat=# alter table taggit_tag drop constraint taggit_tag_name_key;
ALTER TABLE
kobocat=# alter table taggit_tag drop constraint taggit_tag_slug_key;
ALTER TABLE
kobocat=# alter table taggit_taggeditem drop constraint taggit_taggeditem_tag_id_6318217c0d95e0d2_fk_taggit_tag_id;
ALTER TABLE
kobocat=# truncate table taggit_tag;
TRUNCATE TABLE
Let's try copying the (inconsistent) data again:
root@newpg:/# pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$'
Password:
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
setval
--------
2985
(1 row)
This was very dissatisfying because I couldn't put the constraints back with such trash data. I decided just to clean it up (code ran in KoBoCAT's shell_plus
):
from __future__ import print_function, unicode_literals
from collections import Counter
from django.db import models
tt = list(Tag.objects.values_list('id', 'name', 'slug'))
dupenames = [x for x, count in Counter([x[1] for x in tt]).items() if count > 1]
dupeslugs = [x for x, count in Counter([x[2] for x in tt]).items() if count > 1]
class TagUid(models.Model):
""" hack to access KPI model """
tag = models.OneToOneField(Tag, on_delete=models.CASCADE)
class Meta:
app_label = 'kpi'
def clean_up(pks):
"""
reassign all related objects to use the first pk,
then delete every pk except the first one
"""
update_count = TaggedItem.objects.filter(tag_id__in=pks[1:]).update(tag_id=pks[0])
print('{}\t{}'.format(Tag.objects.filter(pk=pks[0]).values('pk', 'name', 'slug')[0], update_count))
# prune the duplicates
with transaction.atomic():
trash = Tag.objects.filter(pk__in=pks[1:]).select_for_update()
# annoying: KPI has an extra model related to Tag
TagUid.objects.filter(tag=trash).delete()
trash.delete()
for n in dupenames:
# `Tag.objects.filter(name=n)` will return only one result!
pks = sorted([x[0] for x in tt if x[1] == n])
clean_up(pks)
for s in dupeslugs:
pks = sorted([x[0] for x in tt if x[2] == s])
clean_up(pks)
Next, I truncated taggit_tag
on the destination database, re-ran pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=taggit_tag kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$'
, and re-added the constraints:
ALTER TABLE ONLY public.taggit_tag ADD CONSTRAINT taggit_tag_name_key UNIQUE (name);
ALTER TABLE ONLY public.taggit_tag ADD CONSTRAINT taggit_tag_slug_key UNIQUE (slug);
ALTER TABLE ONLY public.taggit_taggeditem
ADD CONSTRAINT taggit_taggeditem_tag_id_6318217c0d95e0d2_fk_taggit_tag_id
FOREIGN KEY (tag_id) REFERENCES public.taggit_tag(id) DEFERRABLE INITIALLY DEFERRED;
Yikes, trying to copy over viewer_export
with pg_dump -U kobo -h 172.17.0.1 --data-only --inserts --table=viewer_export kobotoolbox | psql -U postgres kobocat
complains a lot about
HINT: You will need to rewrite or cast the expression.
ERROR: column "created_on" is of type timestamp with time zone but expression is of type integer
LINE 1: INSERT INTO public.viewer_export VALUES (132050, 94878, '201...
Evidently the columns are not in the same order!?!?!
This looks more promising:
root@newpg:/# pg_dump --help
pg_dump dumps a database as a text file or to other formats.
…
Options controlling the output content:
--column-inserts dump data as INSERT commands with column names
…
Let's truncate and try again:
psql -U postgres kobocat -c 'truncate table viewer_export;'
pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=viewer_export kobotoolbox | psql -U postgres kobocat | grep -v '^INSERT 0 1$'
Result:
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
ERROR: duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL: Key (xform_id, filename)=(162955, ACTED_RRM_Registration_2018_01_04_23_24_30.xlsx) already exists.
ERROR: duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL: Key (xform_id, filename)=(249631, PDM_DFID_FSL_03072018_2018_07_15_09_36_59.xlsx) already exists.
ERROR: duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL: Key (xform_id, filename)=(277414, aewU7nheb6fhvErZH7ZGJS_2018_10_01_12_43_26.xlsx) already exists.
setval
--------
801166
(1 row)
I don't think there's any need to worry about these rows. By definition, if the identical xform-filename pair already exists, the user can already access the export.
Somehow, despite the errors, I get identical counts and maximum PKs from both databases:
kobotoolbox=# select count(*) from viewer_export;
count
--------
171825
(1 row)
kobotoolbox=# select max(id) from viewer_export;
max
--------
801169
(1 row)
kobocat=# select count(*) from viewer_export;
count
--------
171825
(1 row)
kobocat=# select max(id) from viewer_export;
max
--------
801169
(1 row)
Now we must turn to logger_surveytype
… which is related to big, bad logger_instance
.
It looks doubtful that this table is really used for anything:
kobocat$ ag -G py --ignore south_migrations survey_type
onadata/apps/logger/models/__init__.py
3:from onadata.apps.logger.models.survey_type import SurveyType
onadata/apps/logger/models/instance.py
19:from onadata.apps.logger.models.survey_type import SurveyType
118: survey_type = models.ForeignKey(SurveyType)
225: def _set_survey_type(self):
226: self.survey_type, created = \
375: self._set_survey_type()
onadata/apps/logger/migrations/0001_initial.py
128: name='survey_type',
onadata/libs/utils/common_tags.py
10:SURVEY_TYPE = u'_survey_type_slug'
The health of this table in the source database is questionable:
kobotoolbox=# select id from logger_surveytype where slug='CRF_tuky';
id
-------
73560
(1 row)
kobotoolbox=# select id from logger_surveytype where concat(slug,'') ='CRF_tuky';
id
-------
54836
73560
(2 rows)
Cleaning up logger_surveytype
, again using KoBoCAT's shell_plus
:
from __future__ import print_function, unicode_literals
from collections import Counter
sts = list(SurveyType.objects.values_list('pk', 'slug'))
dupes = [x for x, count in Counter([x[1] for x in sts]).items() if count > 1]
counter = 0
for dupe in dupes:
counter += 1
# `SurveyType.objects.filter(slug=dupe)` will return only one result!
pks = sorted([x[0] for x in sts if x[1] == dupe])
# reassign all related objects to use the first pk, then delete every pk
# except the first one
update_count = Instance.objects.filter(survey_type_id__in=pks[1:]).update(
survey_type_id=pks[0]
)
print('({}/{})\t{}\t{}'.format(counter, len(dupes), dupe, update_count))
# prune the duplicates
SurveyType.objects.filter(pk__in=pks[1:]).delete()
UPDATE
rewrites entire rows, mass updating survey_type_id
from logger_instance
ballooned disk consumption. After expanding the disk volume used by Postgres, the script completed successfully. I'm not going to worry about bloat in the source database (since we plan to discard it), but for future reference: https://www.keithf4.com/checking-for-postgresql-bloat/
Some tables from the original database haven't made it into either the new kobocat
or koboform
databases:
Table | Explanation |
---|---|
celery_taskmeta | Outdated? Current DBs use django_celery_… tables |
celery_taskmeta_id_seq | Outdated? Current DBs use django_celery_… tables |
celery_tasksetmeta | Outdated? Current DBs use django_celery_… tables |
celery_tasksetmeta_id_seq | Outdated? Current DBs use django_celery_… tables |
djcelery_crontabschedule | Outdated? Current DBs use django_celery_… tables |
djcelery_crontabschedule_id_seq | Outdated? Current DBs use django_celery_… tables |
djcelery_intervalschedule | Outdated? Current DBs use django_celery_… tables |
djcelery_intervalschedule_id_seq | Outdated? Current DBs use django_celery_… tables |
djcelery_periodictask | Outdated? Current DBs use django_celery_… tables |
djcelery_periodictask_id_seq | Outdated? Current DBs use django_celery_… tables |
djcelery_periodictasks | Outdated? Current DBs use django_celery_… tables |
djcelery_taskstate | Outdated? Current DBs use django_celery_… tables |
djcelery_taskstate_id_seq | Outdated? Current DBs use django_celery_… tables |
djcelery_workerstate | Outdated? Current DBs use django_celery_… tables |
djcelery_workerstate_id_seq | Outdated? Current DBs use django_celery_… tables |
koboform_surveydraft | Old dkobo table; we should back it up |
koboform_surveydraft_id_seq | Old dkobo table; we should back it up |
koboform_surveypreview | Old dkobo table; we should back it up |
koboform_surveypreview_id_seq | Old dkobo table; we should back it up |
main_sitewidemessage | Empty table; replaced by hub_sitewidemessage ? |
main_sitewidemessage_id_seq | Empty table; replaced by hub_sitewidemessage ? |
south_migrationhistory | Relic of the days when Django did not have built-in schema migration |
south_migrationhistory_id_seq | Relic of the days when Django did not have built-in schema migration |
Backup of old dkobo stuff:
root@newpg:/var/lib/postgresql/data# du -hs .
863G .
root@newpg:/var/lib/postgresql/data# mkdir old_dkobo_surveydraft_surveypreview_tables
root@newpg:/var/lib/postgresql/data# cd old_dkobo_surveydraft_surveypreview_tables
root@newpg:/var/lib/postgresql/data/old_dkobo_surveydraft_surveypreview_tables# pg_dump -U kobo -h 172.17.0.1 --table=koboform_surveydraft --table=koboform_surveypreview kobotoolbox | bzip2 > dump.sql.bz2
Password:
root@newpg:/var/lib/postgresql/data/old_dkobo_surveydraft_surveypreview_tables# du -h dump.sql.bz2
397M dump.sql.bz2
Python utility script to check the counts and maximum PKs of tables in the old and new databases:
import psycopg2
import sys
import threading
class DbThreadThing:
def __init__(self, dsn):
self.conn = psycopg2.connect(dsn)
def __del__(self):
self.conn.close()
def get_table_count_and_max_id(self, table):
with self.conn.cursor() as cur:
cur.execute("select count(*), max(id) from " + table)
self.result = cur.fetchone()
def start_query(self, table):
self.thread = threading.Thread(
target=self.get_table_count_and_max_id, args=(table,)
)
self.thread.start()
def wait_for_result(self):
self.thread.join()
return self.result
old = DbThreadThing("host=172.17.0.1 user=kobo password=righto dbname=kobotoolbox")
if sys.argv[1] == "kobocat":
new = DbThreadThing("user=postgres dbname=kobocat")
elif sys.argv[1] == "koboform":
new = DbThreadThing("user=postgres dbname=koboform")
table = sys.argv[2]
old.start_query(table)
new.start_query(table)
old_count, old_max = old.wait_for_result()
new_count, new_max = new.wait_for_result()
print(
" ".join(
[
str(x)
for x in [
table,
' ' * max(0, 50 - len(table)),
new_count - old_count,
new_max - old_max,
new_count,
old_count,
new_max,
old_max,
]
]
)
)
KoBoCAT's tables stopped replicating for some reason
Results for most KPI tables below. We need to deal with authtoken_token
separately, because it has no id
column. A handful of django_celery_beat_…
stuff I just didn't consider.
taggit_tag
:
auth_group_permissions 0 0 1 1 1 1
auth_permission 0 0 280 280 286 286
constance_config 0 0 8 8 8 8
django_admin_log 0 0 1783 1783 1879 1879
django_content_type 0 0 86 86 88 88
django_digest_partialdigest 0 0 292472 292472 384253 384253
django_digest_usernonce 0 0 150 150 10692085 10692085
django_migrations 0 0 166 166 199 199
external_integrations_corsmodel 0 0 3 3 3 3
help_inappmessage 0 0 3 3 3 3
help_inappmessagefile 0 0 1 1 1 1
help_inappmessageuserinteractions 0 0 19014 19014 19198 19198
hook_hook 0 0 596 596 785 785
hook_hooklog 0 0 712214 712214 716328 716328
hub_configurationfile 0 0 0 0 None None
hub_sitewidemessage 0 0 3 3 8 8
kpi_assetuserpartialpermission 0 0 1956 1956 19220 19220
kpi_authorizedapplication 0 0 0 0 None None
kpi_onetimeauthenticationkey 0 0 0 0 None None
oauth2_provider_accesstoken 0 0 915 915 34839 34839
oauth2_provider_application 0 0 57 57 96 96
oauth2_provider_grant 0 0 176 176 381 381
oauth2_provider_refreshtoken 0 0 915 915 34839 34839
auth_group 0 0 3 3 4 4
auth_user_groups 0 0 2 2 2 2
hub_extrauserdetail 0 0 135814 135814 136017 136017
hub_formbuilderpreference 0 0 36728 36728 36864 36864
hub_perusersetting 0 0 1 1 1 1
kpi_collection 0 0 6457 6457 7221 7221
kpi_taguid 0 0 1889 1889 2265 2265
kpi_usercollectionsubscription 0 0 2779 2779 4038 4038
registration_registrationprofile 0 0 146089 146089 146597 146597
auth_user 0 0 146113 146113 147331 147331
auth_user_user_permissions 0 0 7415960 7415960 7428499 7428499
kpi_assetfile 0 0 313 313 510 510
taggit_tag -2014 0 31 2045 3036 3036
taggit_taggeditem 0 0 78328 78328 265517 265517
kpi_asset 0 0 593178 593178 2350930 2350930
kpi_exporttask 0 0 311251 311251 546123 546123
kpi_importtask 0 0 422642 422642 755604 755604
kpi_assetsnapshot 0 0 772098 772098 1754232 1754232
kpi_assetversion 0 0 3750118 3750118 7837019 7837019
kpi_objectpermission 0 0 4104601 4104601 48305011 48305011
reversion_revision 0 0 49910339 49910339 84127042 84127042
reversion_version 0 0 49958377 49958377 84251592 84251592
real 13m7.476s
user 0m2.348s
sys 0m0.448s
Manual, row-by-row copy of taggit_tag
:
pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=taggit_tag kobotoolbox | psql -U kobo koboform | grep -v '^INSERT 0 1$'
<various expected duplicates>
Checking again:
python3 count.py koboform taggit_tag
taggit_tag 0 0 -1 -1 3036 3036
Changed the frontends' config to use the new koboform
database, but not the new kobocat
database (which hasn't replicated properly).
Dropped the subscription and node on the new database:
koboform=# select pglogical.drop_subscription('kpi_db_subscription');
koboform=# select pglogical.drop_node('kpi_db_subscriber');
Dropped the replication set on the source:
select pglogical.drop_replication_set('kpi');
Mangled the source database so any frontend using it will fail immediately:
kobotoolbox=# select pglogical.drop_replication_set('kpi');
kobotoolbox=# alter table kpi_asset rename to DO_NOT_USE_kpi_asset;
Had to give up on the initial KoBoCAT replication effort. According to pg_stat_replication
, kobocat_db_subscription
would never get out of the catchup
state, even once sent_location
, write_location
, flush_location
, and replay_location
were all equal. After sitting for a while, the pglogical
process would die, kobocat_db_subscription
would disappear from pg_stat_replication
, and the process would repeat, with sent_location
always jumping back to the same location. This kind of stuff appeared repeatedly in the destination database's log:
LOG: starting apply for subscription kobocat_db_subscription
LOG: worker process: pglogical apply 50039343:725963785 (PID 44) exited with exit code 1
I removed all tables except logger_instance
from the replication set on the source database and dropped them from the destination; even then, it would not catch up. I gave up, dropped the subscription on the destination, truncated the incomplete logger_instance
, and restarted replication for logger_instance
alone. If it succeeds, I'll add the other tables.
Woohoo, we're streaming logger_instance
again! This took roughly 40 hours.
kobotoolbox=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 10895
usesysid | 16384
usename | kobo
application_name | kobocat_db_subscription
client_addr | 172.18.0.1
client_hostname |
client_port | 35572
backend_start | 2020-01-30 23:09:37.212477+00
backend_xmin |
state | streaming
sent_location | B6C/79C06AA0
write_location | B6C/79C06AA0
flush_location | B6C/79C06AA0
replay_location | B6C/79C06AA0
sync_priority | 0
sync_state | async
taggit_tag
and viewer_export
need attention:
api_project 0 0 3 3 3 3
api_project_user_stars 0 0 0 0 None None
api_projectxform 0 0 0 0 None None
api_team_projects 0 0 0 0 None None
corsheaders_corsmodel 0 0 0 0 None None
django_admin_log 0 0 1783 1783 1879 1879
django_content_type 0 0 86 86 88 88
django_migrations 0 0 166 166 199 199
django_site 0 0 2 2 2 2
oauth2_provider_accesstoken 0 0 915 915 34839 34839
oauth2_provider_application 0 0 57 57 96 96
oauth2_provider_grant 0 0 176 176 381 381
oauth2_provider_refreshtoken 0 0 915 915 34839 34839
stats_statscount 0 0 118896 118896 118896 118896
auth_group 0 0 3 3 4 4
auth_group_permissions 0 0 1 1 1 1
auth_permission 0 0 280 280 286 286
auth_user_groups 0 0 2 2 2 2
django_digest_partialdigest 0 0 294197 294197 387352 387352
django_digest_usernonce 0 0 123 123 10811021 10811021
guardian_groupobjectpermission 0 0 0 0 None None
logger_ziggyinstance 0 0 0 0 None None
main_metadata 0 0 617539 617539 1170046 1170046
restservice_restservice 0 0 494 494 1886 1886
taggit_tag -2045 0 0 2045 None 3036
taggit_taggeditem 0 0 78328 78328 265517 265517
auth_user 0 0 146962 146962 148979 148979
auth_user_user_permissions 0 0 7446560 7446560 7459099 7459099
logger_note 0 0 15096 15096 22395 22395
logger_surveytype 0 0 213373 213373 230971 230971
main_userprofile 0 0 114244 114244 115040 115040
registration_registrationprofile 0 0 146091 146091 146599 146599
viewer_columnrename 0 0 0 0 None None
guardian_userobjectpermission 0 0 5682070 5682070 6856957 6856957
logger_attachment 0 0 29808595 29808595 30387456 30387456
logger_instancehistory 0 0 936073 936073 987852 987852
logger_xform 0 0 242753 242753 486723 486723
viewer_export -172609 0 647 173256 805236 805236
viewer_instancemodification 0 0 0 0 None None
logger_instance 0 0 80195164 80195164 83078465 83078465
reversion_revision 0 0 50550247 50550247 84766950 84766950
reversion_version 0 0 50600598 50600598 84893813 84893813
viewer_parsedinstance 0 0 80197715 80197715 83017168 83017168
This comparison of counts and largest IDs for each table took about 30 minutes to run:
real 28m36.392s
user 0m2.320s
sys 0m0.348s
Manually copied taggit_tag
data with:
pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=taggit_tag kobotoolbox | psql -U kobo kobocat | grep -v '^INSERT 0 1$'
…but saw some familiar issues:
ERROR: duplicate key value violates unique constraint "taggit_tag_name_key"
Counts were slightly off as a result:
# python3 count.py kobocat taggit_tag
taggit_tag -19 -39 2026 2045 2997 3036
Re-ran clean up script from #265 (comment); now counts match:
# python3 count.py kobocat taggit_tag
taggit_tag 0 0 2026 2026 2997 2997
Manually copying viewer_export
:
pg_dump -U kobo -h 172.17.0.1 --data-only --column-inserts --table=viewer_export kobotoolbox | psql -U kobo kobocat | grep -v '^INSERT 0 1$'
…revealed expected duplicates from replication:
ERROR: duplicate key value violates unique constraint "viewer_export_pkey"
…but also some unexpected unique constraint violations:
ERROR: duplicate key value violates unique constraint "viewer_export_xform_id_7a553159ed21a101_uniq"
DETAIL: Key (xform_id, filename)=(162955, <redacted>.xlsx) already exists.
Unsurprisingly, the counts did not match:
root@postgres:/var/lib/postgresql/data/jnm_zomg# python3 count.py kobocat viewer_export
viewer_export -3 0 174519 174522 808994 808994
The source database was showing inconsistencies that you don't like to see:
kobotoolbox=# select id, xform_id, filename from viewer_export where xform_id=162955 and filename='<redacted>.xlsx';
id | xform_id | filename
--------+----------+-------------------------------------------------
226812 | 162955 | <redacted>.xlsx
(1 row)
kobotoolbox=# select id, xform_id, filename from viewer_export where xform_id=162955 and concat(filename, '')='<redacted>.xlsx';
id | xform_id | filename
--------+----------+-------------------------------------------------
226812 | 162955 | <redacted>.xlsx
226813 | 162955 | <redacted>.xlsx
(2 rows)
There were only three of these, so I cleaned them manually by deleting the duplicates. Previously, I didn't care about them, but I became worried they might interfere with replication. Counts were still off by one, so I adapted the count/max(id) Python script to find the missing ID:
import psycopg2
import sys
import threading
class DbThreadThing:
def __init__(self, dsn):
self.conn = psycopg2.connect(dsn)
def __del__(self):
self.conn.close()
def get_all_ids_from_table(self, table):
with self.conn.cursor() as cur:
cur.execute("select id from " + table)
self.result = cur.fetchall()
def start_query(self, table):
self.thread = threading.Thread(
target=self.get_all_ids_from_table, args=(table,)
)
self.thread.start()
def wait_for_result(self):
self.thread.join()
return self.result
old = DbThreadThing("host=172.17.0.1 user=kobo password=righto dbname=kobotoolbox")
if sys.argv[1] == "kobocat":
new = DbThreadThing("user=postgres dbname=kobocat")
elif sys.argv[1] == "koboform":
new = DbThreadThing("user=postgres dbname=koboform")
table = sys.argv[2]
old.start_query(table)
new.start_query(table)
old_ids = old.wait_for_result()
new_ids = new.wait_for_result()
def unwrap_ids(ids):
return [t[0] for t in ids]
print(set(unwrap_ids(old_ids)).symmetric_difference(unwrap_ids(new_ids)))
…and manually added it to the destination database.
Needed to add sequences for KC (list obtained from \ds
in the new KC database; commands below executed on the source database):
select pglogical.replication_set_add_sequence('kobocat', 'api_project_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_project_user_stars_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_projectxform_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'api_team_projects_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_group_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_group_permissions_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_permission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_groups_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'auth_user_user_permissions_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'corsheaders_corsmodel_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_admin_log_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_crontabschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_intervalschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_periodictask_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_celery_beat_solarschedule_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_content_type_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_digest_partialdigest_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_digest_usernonce_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_migrations_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'django_site_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'guardian_groupobjectpermission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'guardian_userobjectpermission_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_attachment_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_instance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_instancehistory_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_note_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_surveytype_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_xform_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'logger_ziggyinstance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'main_metadata_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'main_userprofile_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_accesstoken_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_application_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_grant_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'oauth2_provider_refreshtoken_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'registration_registrationprofile_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'restservice_restservice_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'reversion_revision_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'reversion_version_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'stats_statscount_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'taggit_tag_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'taggit_taggeditem_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_columnrename_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_export_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_instancemodification_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'viewer_parsedinstance_id_seq', synchronize_data := true);
select pglogical.replication_set_add_sequence('kobocat', 'topology_id_seq', synchronize_data := true);
Cut KoBoCAT over to new database successfully:
https://github.com/kobotoolbox/kobo-deployments/commit/55d908a5382ddf2da4a123342c01c46c5670f201
https://github.com/kobotoolbox/kobo-deployments/commit/9e68590de84ffde1902ca5c0fd0ad14821e075c0
Dropped node and subscription from new database (warning isn't surprising given that the source database has been stopped):
kobocat=# select pglogical.drop_subscription('kobocat_db_subscription');
WARNING: could not drop slot "pgl_kobocat_combined12d34dc_kobocat_2b45540" on provider, you will probably have to drop it manually
drop_subscription
-------------------
1
(1 row)
kobocat=# select pglogical.drop_node('kobocat_db_subscriber');
drop_node
-----------
t
(1 row)
I'd removed some constraints from the new KoBoCAT database so that bad data in the source wouldn't break replication. I didn't restore the constraints, however, and people started getting 500 errors when trying to submit because get_or_create()
isn't atomic:
https://github.com/kobotoolbox/kobocat/blob/cdfe325670cc7ba86b87311792309693891750fe/onadata/apps/logger/models/instance.py#L225-L227
MultipleObjectsReturned/{username}/submission
error
get() returned more than one SurveyType -- it returned 2!
Fixed by cleaning up the database with the Python code in http://r.tmoj.net/https://github.com/kobotoolbox/kobo-docker/issues/265#issuecomment-577833875 and adding the unique constraint:
kobocat=# ALTER TABLE ONLY public.logger_surveytype
ADD CONSTRAINT logger_surveytype_slug_key UNIQUE (slug);
ALTER TABLE
kobocat=#
A diff of the pg_dump --schema-only
between the new OCHA KoBoCAT database and a fresh one shows that some other constraints also need to be added:
ALTER TABLE ONLY public.logger_instance
ADD CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id FOREIGN KEY (survey_type_id) REFERENCES public.logger_surveytype(id) DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE ONLY public.logger_instance
ADD CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES public.auth_user(id) DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE ONLY public.logger_instance
ADD CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id FOREIGN KEY (xform_id) REFERENCES public.logger_xform(id) DEFERRABLE INITIALLY DEFERRED;
Adding those constraints locks the table (ouch), so we can make them NOT VALID
—where they apply only to new records—and then VALIDATE CONSTRAINT
as a non-locking job later.
Adding the three NOT VALID
constraints:
kobocat=# \timing
Timing is on.
kobocat=# ALTER TABLE ONLY public.logger_instance
ADD CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id FOREIGN KEY (survey_type_id) REFERENCES public.logger_surveytype(id) DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE
Time: 64.423 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
ADD CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES public.auth_user(id) DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE
Time: 1.371 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-# ADD CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id FOREIGN KEY (xform_id) REFERENCES public.logger_xform(id) DEFERRABLE INITIALLY DEFERRED
kobocat-# NOT VALID;
ALTER TABLE
Time: 1.907 ms
Validating them, which is slow but does not require blocking writes to the table:
kobocat=# ALTER TABLE ONLY public.logger_instance VALIDATE CONSTRAINT logger_instance_xform_id_546567e950972f94_fk_logger_xform_id;
ALTER TABLE
Time: 669835.189 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-# VALIDATE CONSTRAINT logger_instance_user_id_3a3b162ef6bedc0e_fk_auth_user_id;
ALTER TABLE
Time: 642934.770 ms
kobocat=# ALTER TABLE ONLY public.logger_instance
kobocat-# VALIDATE CONSTRAINT logger__survey_type_id_5a58f1cfec1771cd_fk_logger_surveytype_id;
ALTER TABLE
Time: 627803.682 ms
A diff of pg_dump --schema-only
between the OCHA KoBoCAT database and a fresh KC database now looks good.