reactive-tech/kubegres

Kubegres pods are restarting again and again and creating new replicas

richamishra006 opened this issue · 1 comments

Hi Team, I have deployed kubegres with three replicas, the pod count is something like postgresql-32-0 , postgresql-34-0 , postgresql-35-0 . The pods are restarting and creating new replica, I am unable to find that whats causing this.
I am adding the logs here

2022-07-25 02:01:14.159 GMT [1] LOG:  starting PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2022-07-25 02:01:14.190 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-07-25 02:01:14.190 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2022-07-25 02:01:14.296 GMT [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-07-25 02:01:14.504 GMT [31] LOG:  database system was interrupted; last known up at 2022-07-25 01:59:40 GMT
2022-07-25 02:01:15.089 GMT [31] LOG:  entering standby mode
2022-07-25 02:01:15.218 GMT [31] LOG:  redo starts at 5/E4000028
2022-07-25 02:01:17.128 GMT [31] LOG:  consistent recovery state reached at 5/E46CD688
2022-07-25 02:01:17.138 GMT [1] LOG:  database system is ready to accept read only connections
2022-07-25 02:01:17.464 GMT [41] LOG:  started streaming WAL from primary at 5/E5000000 on timeline 21
2022-07-25 04:01:48.575 GMT [13264] ERROR:  canceling statement due to conflict with recovery
2022-07-25 04:01:48.575 GMT [13264] DETAIL:  User query might have needed to see row versions that must be removed.
2022-07-25 04:01:48.575 GMT [13264] STATEMENT:  COPY public.reversion_version (id, object_id, format, serialized_data, object_repr, content_type_id, revision_id, db) TO stdout;
2022-07-25 05:08:07.106 GMT [41] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
		This probably means the server terminated abnormally
		before or while processing the request.
2022-07-25 05:08:07.107 GMT [31] LOG:  invalid resource manager ID 32 at 6/754D1B0
2022-07-25 05:08:07.237 GMT [20697] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:12.121 GMT [20717] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:17.130 GMT [20727] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:22.177 GMT [20742] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:27.155 GMT [20749] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:32.146 GMT [20764] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:37.151 GMT [20765] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:42.163 GMT [20766] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:47.205 GMT [20767] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:52.172 GMT [20768] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:08:57.170 GMT [20769] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:09:02.187 GMT [20770] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:09:07.183 GMT [20771] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:09:12.193 GMT [20772] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:09:17.203 GMT [20812] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 05:09:22.213 GMT [20813] LOG:  fetching timeline history file for timeline 22 from primary server
2022-07-25 05:09:22.251 GMT [20813] LOG:  started streaming WAL from primary at 6/7000000 on timeline 21
2022-07-25 05:09:22.418 GMT [20813] LOG:  replication terminated by primary server
2022-07-25 05:09:22.418 GMT [20813] DETAIL:  End of WAL reached on timeline 21 at 6/754D1B0.
2022-07-25 05:09:22.427 GMT [31] LOG:  new target timeline is 22
2022-07-25 05:09:22.534 GMT [20813] LOG:  restarted WAL streaming at 6/7000000 on timeline 22
2022-07-25 06:00:36.857 GMT [24859] LOG:  duration: 7022.961 ms  statement: COPY public.campaign (id, created, modified, filters, suid, title, priority, status, actions, action_type, substitutes, groups, type, active, smart, monitoring, weekdays, start_at, end_at, best_time, activation, timezone, activates_at, expires_at, finished, feed, app_id, creator_id, segment_id, rematch_repeat, rematch_duration, target, metric, champion, split_id, product, feed_deletion_conditions, tag_filters, goal_id, operating_system, goals, feed_repeat_conditions, last_editor_id, trigger_conditions) TO stdout;
2022-07-25 06:02:21.215 GMT [24859] LOG:  duration: 99503.629 ms  statement: COPY public.reversion_version (id, object_id, format, serialized_data, object_repr, content_type_id, revision_id, db) TO stdout;
2022-07-25 06:12:07.603 GMT [20813] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
		This probably means the server terminated abnormally
		before or while processing the request.
2022-07-25 06:12:07.604 GMT [31] LOG:  record with incorrect prev-link 636B71A1/910 at 6/136DC6A8
2022-07-25 06:12:07.778 GMT [26358] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:12.678 GMT [26375] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:17.681 GMT [26384] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:22.640 GMT [26406] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:27.761 GMT [26414] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:34.846 GMT [26426] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:38.093 GMT [26433] FATAL:  could not connect to the primary server: could not translate host name "pointzi-postgresql" to address: Name or service not known
2022-07-25 06:12:39.203 GMT [1] LOG:  received fast shutdown request
2022-07-25 06:12:39.305 GMT [1] LOG:  aborting any active transactions
2022-07-25 06:12:39.932 GMT [32] LOG:  shutting down
2022-07-25 06:12:40.281 GMT [1] LOG:  database system is shut down

Please help me with this, any suggestions will be higly appreciated

teebu commented

How did you solve this? we're also seeing could not connect to the primary server