WARNING:pgsync.sync: Checkpoint file not found

Question

WARNING:pgsync.sync: Checkpoint file not found

jadsy2107 opened this issue a year ago · 5 comments

jadsy2107 commented a year ago

PGSync version:
Latest git branch 2.5.0

Postgres version:
13

Elasticsearch version:
8.9.2

Redis version:
docker redis:alpine

Python version:
docker python3.11-slim

Problem Description:
Checkpoint file keeps getting deleted.

Error Message (if any):
2023-09-09 06:57:28.047:WARNING:pgsync.sync: Checkpoint file not found: /usr/src/app/.fusionpbx_v_xml_cdr

Dockerfile

FROM python:3.11-slim

WORKDIR /usr/src/app

RUN apt update \
    && apt install -y moreutils \
    && apt install -y jq \
    && apt install -y git \
    && apt install -y wait-for-it

RUN pip install git+https://github.com/toluaina/pgsync.git

ENTRYPOINT ["bash", "./entrypoint.sh"]

docker-compose.yml

version: '3.8'
services:
  elasticsearch:
    image: elasticsearch:8.9.2
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - '9200:9200'
    volumes:
        # mkdir elasticsearch && chown -R 1000:1000 elasticsearch
      - ./elasticsearch:/usr/share/elasticsearch/data

  redis:
    image: 'redis:alpine'
    container_name: redis

  pgsync:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: pgsync
    sysctls:
      - net.ipv4.tcp_keepalive_time=200
      - net.ipv4.tcp_keepalive_intvl=200
      - net.ipv4.tcp_keepalive_probes=5
    volumes:
        # chown 1000:1000 pgsync
      - ./pgsync:/usr/src/app
    environment:
      - PG_USER=fusionpbx
      - PG_PASSWORD=XXX
      - PG_DATABASE=fusionpbx
      - PG_HOST=XXX
      - PG_PORT=5432
      - ELASTICSEARCH_HOST=elasticsearch
      - ELASTICSEARCH_PORT=9200
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - ELASTICSEARCH=true
      - OPENSEARCH=false
      - CHECKPOINT_PATH=/usr/src/app
    depends_on:
      - redis
      - elasticsearch

entrypoint.sh

#!/usr/bin/env bash

wait-for-it $PG_HOST:$PG_PORT -t 60
wait-for-it $REDIS_HOST:$REDIS_PORT -t 60
wait-for-it $ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT -t 60

jq '.[].database = env.PG_DATABASE' schema.json | sponge schema.json

bootstrap -c ./schema.json -t
bootstrap --config ./schema.json
pgsync --config ./schema.json -d

schema.json

[
  {
    "database": "fusionpbx",
    "index": "v_xml_cdr",
    "nodes": {
      "table": "v_xml_cdr",
      "schema": "public",
      "columns": [
        "network_addr",
        "billsec",
        "hangup_cause",
        "domain_name",
        "domain_uuid",
        "start_epoch",
        "start_stamp",
        "end_epoch",
        "end_stamp",
        "direction",
        "extension_uuid",
        "source_number",
        "caller_id_number",
        "caller_id_name",
        "destination_number"
      ],
      "children": [
        {
          "table": "v_extensions",
          "columns": [
            "effective_caller_id_name",
            "effective_caller_id_number",
            "outbound_caller_id_name",
            "outbound_caller_id_number"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_one",
            "foreign_key": {
              "child": [
                "extension_uuid"
              ],
              "parent": [
                "extension_uuid"
              ]
            }
          }
        },
        {
          "table": "v_domains",
          "columns": [
            "domain_description"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_one",
            "foreign_key": {
              "child": [
                "domain_uuid"
              ],
              "parent": [
                "domain_uuid"
              ]
            }
          }
        }
      ]
    }
  }
]

docker compose up - brings up the stack and syncs fine, I just want it to work after reboots, without resyncing the entire data again,

After the first startup, it takes around 10 minutes to sync (lots of data) and creates a checkpoint file which i can see at the mountpoint .fusionpbx_v_xml_cdr

docker compose down - the checkpoint file still exists when the stack is downed, however on the next startup it deletes this file and starts the sync from scratch, even duplicating data meaning that the data in elasticsearch is unreliable and incorrect

http://localhost:9200/_cat/indices
yellow open v_xml_cdr p1umAsPtRxSmS-67EZqlSQ 1 1 3302175 0 937.5mb 937.5mb

after a second run, this grows to double the size...

What is your best advice ?

After first run and before shutdown:

After second startup:

Thanks in advance !!

Answer 1 · 2023-09-09T07:17:38.000Z

May i stress, it's working perfectly - and i'm using the data in Grafana to build dashboards

just the issue of restarting the stack, causes duplication of data ?

Great project! Big fan 👍

Answer 2 · 2023-09-09T07:26:45.000Z

I checked the indices after about 10 minutes,

yellow open v_xml_cdr p1umAsPtRxSmS-67EZqlSQ 1 1 3322957 257988 1023.5mb 1023.5mb

Seems to have shrunken down to size, but still seems like theres duplicate data as the original sync wasn't that big

Answer 3 · 2023-09-20T22:24:44.000Z

I think it's hard to tell why the index is so large from the data above.
there is some denormalization happening when you go from relational to Elastic/OpenSearch so that is probably why data is getting duplicated.
regarding the checkpoint can you ensure there is a file named /usr/src/app/.fusionpbx_v_xml_cdr that is accessible from the container?

sorry about the delay

Answer 4 · 2024-05-06T13:11:23.000Z

Hi @toluaina,

I'm experiencing similar issues. Pgsync itself works great, but any restart causes full sync, which takes some time. With larger datasets even up to 3+ hours.

When I test it locally:

if there is no checkpoint file pgsync logs it and creates them
if files are there it deletes it and performs full sync anyway
I can see the files in the container and can read them

How can I verify if it actually starts from where it left off?

Answer 5 · 2024-05-14T10:52:17.000Z

I just found out that calling bootstrap process deletes checkpoint files. Should bootstrap be called only on demand when there are new indices added or schema is changed?
If I run just pgsync command without calling bootstrap before it immediatelly starts continous sync.