omniti-labs/omnipitr

Backup process not terminating.

Ferada opened this issue · 2 comments

Another problem came up while running the actual backup part,
i.e. /var/lib/postgresql/omnipitr/bin/omnipitr-backup-slave --config=/var/lib/postgresql/omnipitr-backup.config,
that is the script didn't find the final compressed file.

Edit: So it seems that since the files have a .gz at the end the regex doesn't match anymore. Unfortunately I can't quite figure out which of the regexes is the problem, however manually renaming the file helps to complete the backup:

2015-03-09 12:31:01.967071 +0000 : 26776 : omnipitr-backup-slave : LOG : File 00000001000001AB000000B1.0054C6B8.backup arrived after 70 seconds.
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : Script to make tarballs:
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : mkfifo \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-0 \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-1
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : nice \/var\/lib\/postgresql\/omnipitr\/bin\/omnipitr\-checksum \-d SHA\-1 \-f event\-database\-readonly\-xlog\-2015\-03\-09\.tar\.gz < \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-0 >> \/var\/lib\/postgresql\/data\/9\.3\/backups\/event\-database\-readonly\-SHA\-1\-2015\-03\-09\.tar\.gz &2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : nice gzip \-\-stdout \- < \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-1 | tee \/var\/lib\/postgresql\/data\/9\.3\/backups\/event\-database\-readonly\-xlog\-2015\-03\-09\.tar\.gz > \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-0 &
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : nice tar cf \- \-\-transform\=s\#\^\\\(archive\\\|tmp\/omnipitr\-backup\-slave\/26776\\\)\#main\/pg_xlog\# archive\/00000001000001AB000000B1\.0054C6B8\.backup archive\/00000001000001AB000000B1\.0054C6B8\.backup\.gz archive\/00000001000001AB000000B1\.gz archive\/00000001000001AB000000B2\.gz archive\/00000001000001AB000000B3\.gz archive\/00000001000001AB000000B4\.gz archive\/00000001000001AB000000B5\.gz archive\/00000001000001AB000000B6\.gz archive\/00000001000001AB000000B7\.gz archive\/00000001000001AB000000B8\.gz archive\/00000001000001AB000000B9\.gz archive\/00000001000001AB000000BA\.gz archive\/00000001000001AB000000BB\.gz archive\/00000001000001AB000000BC\.gz archive\/00000001000001AB000000BD\.gz archive\/00000001000001AB000000BE\.gz archive\/00000001000001AB000000BF\.gz archive\/00000001000001AB000000C0\.gz archive\/00000001000001AB000000C1\.gz archive\/00000001000001AB000000C2\.gz archive\/00000001000001AB000000C3\.gz archive\/00000001000001AB000000C4\.gz archive\/00000001000001AB000000C5\.gz archive\/00000001000001AB000000C6\.gz archive\/00000001000001AB000000C7\.gz archive\/00000001000001AB000000C8\.gz archive\/00000001000001AB000000C9\.gz archive\/00000001000001AB000000CA\.gz archive\/00000001000001AB000000CB\.gz archive\/00000001000001AB000000CC\.gz archive\/00000001000001AB000000CD\.gz archive\/00000001000001AB000000CE\.gz archive\/00000001000001AB000000CF\.gz archive\/00000001000001AB000000D0\.gz archive\/00000001000001AB000000D1\.gz archive\/00000001000001AB000000D2\.gz archive\/00000001000001AB000000D3\.gz archive\/00000001000001AB000000D4\.gz archive\/00000001000001AB000000D5\.gz archive\/00000001000001AB000000D6\.gz archive\/00000001000001AB000000D7\.gz archive\/00000001000001AB000000D8\.gz archive\/00000001000001AB000000D9\.gz archive\/00000001000001AB000000DA\.gz archive\/00000001000001AB000000DB\.gz archive\/00000001000001AB000000DC\.gz archive\/00000001000001AB000000DD\.gz archive\/00000001000001AB000000DE\.gz archive\/00000001000001AB000000DF\.gz archive\/00000001000001AB000000E0\.gz archive\/00000001000001AB000000E1\.gz archive\/00000001000001AB000000E2\.gz archive\/00000001000001AB000000E3\.gz archive\/00000001000001AB000000E4\.gz archive\/00000001000001AB000000E5\.gz archive\/00000001000001AB000000E6\.gz archive\/00000001000001AB000000E7\.gz archive\/00000001000001AB000000E8\.gz archive\/00000001000001AB000000E9\.gz archive\/00000001000001AB000000EA\.gz archive\/00000001000001AB000000EB\.gz archive\/00000001000001AB000000EC\.gz archive\/00000001000001AB000000ED\.gz archive\/00000001000001AB000000EE\.gz archive\/00000001000001AB000000EF\.gz archive\/00000001000001AB000000F0\.gz archive\/00000001000001AB000000F1\.gz archive\/00000001000001AB000000F2\.gz archive\/00000001000001AB000000F3\.gz 2> \/tmp\/omnipitr\-backup\-slave\/26776\/tar\.stderr > \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-1
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : wait
2015-03-09 12:31:02.214475 +0000 : 26776 : omnipitr-backup-slave : LOG : rm \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-0 \/tmp\/CommandPiper\-26776\-IJn1SH\/fifo\-1
2015-03-09 12:31:22.342530 +0000 : 26776 : omnipitr-backup-slave : LOG : Timer [Compressing xlogs] took: 20.373s
2015-03-09 12:31:22.345004 +0000 : 26776 : omnipitr-backup-slave : LOG : Timer [Making xlog archive] took: 91.009s
2015-03-09 12:31:22.346628 +0000 : 26776 : omnipitr-backup-slave : LOG : Timer [Delivering to all remote destinations] took: 0.000s
2015-03-09 12:31:22.347817 +0000 : 26776 : omnipitr-backup-slave : LOG : Timer [Delivering meta files] took: 0.000s
2015-03-09 12:31:22.348567 +0000 : 26776 : omnipitr-backup-slave : LOG : Timer [Whole backup procedure] took: 2402.708s
2015-03-09 12:31:22.349194 +0000 : 26776 : omnipitr-backup-slave : LOG : All done.

Original log till there:

2015-03-06 18:30:48.664121 +0000 : 10897 : omnipitr-backup-slave : LOG : Called with parameters: --config=/var/lib/postgresql/omnipitr-backup.config
2015-03-06 18:33:37.811151 +0000 : 10897 : omnipitr-backup-slave : LOG : Timer [SELECT w, pg_xlogfile_name(w) from (select pg_start_backup('omnipitr_slave_backup_with_master_callback') as w ) as x] took: 169.128s
2015-03-06 18:33:37.812726 +0000 : 10897 : omnipitr-backup-slave : LOG : pg_start_backup('omnipitr') returned 191/CA5675F0|0000000100000191000000CA.
2015-03-06 18:33:38.020786 +0000 : 10897 : omnipitr-backup-slave : LOG : Timer [select pg_read_file( 'backup_label', 0, ( pg_stat_file( 'backup_label' ) ).size )] took: 0.207s
2015-03-06 18:33:38.022049 +0000 : 10897 : omnipitr-backup-slave : LOG : Waiting for checkpoint (based on backup_label from master) - CHECKPOINT LOCATION: 191/CBCF5388
2015-03-06 18:42:03.844201 +0000 : 10897 : omnipitr-backup-slave : LOG : Checkpoint .
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : Script to make tarballs:
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : mkfifo \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-0 \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-1
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : nice \/var\/lib\/postgresql\/omnipitr\/bin\/omnipitr\-checksum \-d SHA\-1 \-f database\-readonly\-data\-2015\-03\-06\.tar\.gz < \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-0 >> \/var\/lib\/postgresql\/data\/9\.3\/backups\/database\-readonly\-SHA\-1\-2015\-03\-06\.tar\.gz &
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : nice gzip \-\-stdout \- < \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-1 | tee \/var\/lib\/postgresql\/data\/9\.3\/backups\/database\-readonly\-data\-2015\-03\-06\.tar\.gz > \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-0 &
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : nice tar cf \- \-\-exclude\=main\/pg_log\/\* \-\-exclude\=main\/pg_xlog\/0\* \-\-exclude\=main\/pg_xlog\/archive_status\/\* \-\-exclude\=main\/recovery\.conf \-\-exclude\=main\/postmaster\.pid \-\-transform\=s\#\^tmp\/omnipitr\-backup\-slave\/10897\/\#main\/\# main \/tmp\/omnipitr\-backup\-slave\/10897\/backup_label 2> \/tmp\/omnipitr\-backup\-slave\/10897\/tar\.stderr > \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-1
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : wait
2015-03-06 18:42:03.848004 +0000 : 10897 : omnipitr-backup-slave : LOG : rm \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-0 \/tmp\/CommandPiper\-10897\-4MO5hr\/fifo\-1
2015-03-06 19:07:23.018311 +0000 : 10897 : omnipitr-backup-slave : LOG : tar stderr:
2015-03-06 19:07:23.021047 +0000 : 10897 : omnipitr-backup-slave : LOG : ==============================================
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1194911: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1194905: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195067.1: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195111.3: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195075: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1132077: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195117: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195048.1: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195062.2: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195076: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1132083: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195119: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195042: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1194953.2: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1194934: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1324262: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195077: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1195069: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1253157: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/353151/1194950.1: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: main/base/pgsql_tmp: file changed as we read it
2015-03-06 19:07:23.021850 +0000 : 10897 : omnipitr-backup-slave : LOG : tar: Removing leading `/' from member names
2015-03-06 19:07:23.023059 +0000 : 10897 : omnipitr-backup-slave : LOG : ==============================================
2015-03-06 19:07:23.024152 +0000 : 10897 : omnipitr-backup-slave : LOG : Timer [Compressing $PGDATA] took: 1519.178s
2015-03-06 19:07:34.314419 +0000 : 10897 : omnipitr-backup-slave : LOG : Timer [SELECT pg_stop_backup()] took: 11.289s
2015-03-06 19:07:34.315914 +0000 : 10897 : omnipitr-backup-slave : LOG : pg_stop_backup() returned 192/104A49C8.
2015-03-06 19:07:34.316721 +0000 : 10897 : omnipitr-backup-slave : LOG : Timer [Making data archive] took: 2205.645s
2015-03-06 19:07:44.327166 +0000 : 10897 : omnipitr-backup-slave : LOG : Waiting for file matching $VAR1 = qr/(?^:\A[0-9A-F]{8}00000191000000CA\.005675F0\.backup\z)/;
2015-03-06 19:07:44.327166 +0000 : 10897 : omnipitr-backup-slave : LOG :  in directory /var/lib/postgresql/data/9.3/archive
2015-03-06 20:07:38.912364 +0000 : 10897 : omnipitr-backup-slave : FATAL : Waited 1 hour for file matching $VAR1 = qr/(?^:\A[0-9A-F]{8}00000191000000CA\.005675F0\.backup\z)/;
2015-03-06 20:07:38.912364 +0000 : 10897 : omnipitr-backup-slave : FATAL : , but it did not appear. Something is wrong. No sense in waiting longer.

The contents of that directory at the moment is approximately:

/var/lib/postgresql/data/9.3/archive/:
00000001000001900000002B.gz
...
0000000100000190000000BA.gz
0000000100000190000000BB.004E64F0.backup.gz
0000000100000190000000BB.gz
...
0000000100000190000000CC.gz
0000000100000190000000CD.000017B0.backup.gz
0000000100000190000000CD.gz
...
0000000100000190000000F4.gz
0000000100000190000000F5.00E1C9F0.backup.gz
...
0000000100000191000000C9.gz
0000000100000191000000CA.005675F0.backup.gz <--- should match?
0000000100000191000000CA.gz
...
00000001000001AA000000F0.gz

So it looks to me like the file was created, but the regex, or what
it's using to find that file, isn't triggered.

Just for reference, the backup configuration uses:

--data-dir=/var/lib/postgresql/9.3/main/
--verbose
--log=/var/log/postgresql/omnipitr.log
--dst-local=gzip=/var/lib/postgresql/data/9.3/backups/
--call-master
--user=replication
--database=template1
--digest=SHA-1
--host=...
--source=/var/lib/postgresql/data/9.3/archive/
--pgcontroldata-path=/usr/lib/postgresql/9.3/bin/pg_controldata

Well, if your xlog source is compressed, you should mark it as such.Just like docs tell you to:

You can also specify compression for source. Check COMPRESSION section of the doc.

So you should prefix your source path with gzip=

Oh wow I missed that completely. Thanks, works just fine now.