AnatolyUss/nmig

Data Loading Stuck

perrornet opened this issue · 8 comments

I noticed yesterday that the logs show --[NMIG DataWriter] Loading the data into "public"."my_table" table..., but it has remained at this message until now. I'm unsure what is happening.

System Configuration

  • Running in K8S without any CPU or memory limits set.Running in K8S without any CPU or memory limits set

Configuration Information

{
    "source": {
        "host": "mysql",
        "port": 3306,
        "database": "test",
        "charset": "utf8mb4",
        "supportBigNumbers": true,
        "user": "xxx",
        "password": "xxxx"
    },
    "target": {
        "host": "postgres",
        "port": 5432,
        "database": "test",
        "charset": "UTF8",
        "user": "xxxx",
        "password": "xxxx",
        "ssl": {
            "require": true,
            "rejectUnauthorized": false
        }
    },
    "max_each_db_connection_pool_size": 100,
    "number_of_simultaneously_running_loader_processes": "8",
    "loader_max_old_space_size": "16384",
    "streams_high_water_mark": 16384,
    "encoding": "utf8",
    "schema": "public",
    "migrate_only_data": false,
    "delimiter": ",",
    "enable_extra_config": false
}

Logs

    NMIG - the database migration tool
    Copyright (C) 2016 - present, Anatoly Khaytovich <anatolyuss@gmail.com>

    --[boot] Configuration has been just loaded.
    --[boot] NMIG is restarting after some failure.
    --[boot] Consider checking log files at the end of migration.

    --[MigrationStateManager::createStateLogsTable] table "public"."state_logs_publicsubtest" is created...
    --[DataPoolManager::createDataPoolTable] table "public"."data_pool_publicsubtest" is created...
    --[TableProcessor::createTable] Currently creating table: `my_table`
    --[StructureLoader::default] Source DB structure is loaded...

        --[StructureLoader::default] Tables to migrate: 1
        --[StructureLoader::default] Views to migrate: 0
    --[TableProcessor::createTable] Table "public"."my_table" is created...
    --[DataPoolManager::readDataPool] Data-Pool is loaded...

    --[NMIG data transfer] "public"."my_table" DATA TRANSFER IN PROGRESS...
    --[NMIG data transfer] TIME REQUIRED FOR TRANSFER DEPENDS ON AMOUNT OF DATA...

    --[NMIG loadData] Loading the data into "public"."my_table" table...
    --[NMIG DataWriter] Loading the data into "public"."my_table" table...

The my_table contains a total of 202,488,638 rows, with a data size of approximately 396.05 GB.

When I executed SHOW PROCESSLIST in MySQL, I found no active processes. In Postgres, querying SELECT * FROM pg_stat_activity WHERE state NOT ILIKE 'idle%' showed only one SQL statement running: COPY "public"."chain_extrinsics" (xxxxx,xxx,xxxx,xxx,xxxx,xx) FROM STDIN WITH (FORMAT csv, DELIMITER ',', ENCODING 'UTF8');

Here is the process information from the pod:

  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    1209m   2%   6   0% /app/app -c /conf/config.json
   73    62 root     S     735m   1%   8   0% /usr/local/bin/node /app/nmig/dist/src/DataWriter.js --max-old-space-size=16384
   62    30 root     S     731m   1%  10   0% /usr/local/bin/node --max-old-space-size=16384 /app/nmig/dist/src/DataReader.js
   30    16 root     S     684m   1%  11   0% node dist/src/Main.js --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs
   16     1 root     S     681m   1%   8   0% npm start --conf-dir=/data/migrate/config --logs-dir=/data/migrate/logs
   51    30 root     S     677m   1%  12   0% /usr/local/bin/node /app/nmig/dist/src/LogsProcessor.js

Today I deleted the remaining files and database tables left after the last run, but when running again, it still got stuck at the same place.

  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     S    1208m   2%  15   0% /app/app -c /conf/config.json
   69    28 root     S     746m   1%  11   0% /usr/local/bin/node --max-old-space-size=16384 /app/nmig/dist/src/DataReader.js
   80    69 root     S     746m   1%   7   0% /usr/local/bin/node /app/nmig/dist/src/DataWriter.js --max-old-space-size=16384
   28    17 root     S     684m   1%  15   0% node dist/src/Main.js --conf-dir=/data/migrate/test/config --logs-dir=/data/migrate/test
   17     1 root     S     682m   1%   9   0% npm start --conf-dir=/data/migrate/test/config --logs-dir=/data/migrate/test/logs
   49    28 root     S     677m   1%  12   0% /usr/local/bin/node /app/nmig/dist/src/LogsProcessor.js
image

CPU usage:
image

Memory usage:
image

Network:
image

It seems that there is no more any activity after running for about 40 minutes.

I added some logs and found that it was indeed stuck:

--[NMIG loadData] 8/2/2024, 7:18:52 AM: my_table  total count: 1924946/111546685, totalSize: 80.87231482099742 GB
--[NMIG DataWriter] 8/2/2024, 7:18:55 AM: my_table  total count: 1725577/111546685, totalSize: 80.98360828589648 GB
--[NMIG loadData] 8/2/2024, 7:18:57 AM: my_table  total count: 1925282/111546685, totalSize: 81.02011202834547 GB
--[NMIG DataWriter] 8/2/2024, 7:19:00 AM: my_table  total count: 1728086/111546685, totalSize: 81.13046906515956 GB
--[NMIG loadData] 8/2/2024, 7:19:02 AM: my_table  total count: 1925613/111546685, totalSize: 81.16610423196107 GB
--[NMIG DataWriter] 8/2/2024, 7:19:06 AM: my_table  total count: 1730421/111546685, totalSize: 81.2668507695198 GB
--[NMIG loadData] 8/2/2024, 7:19:07 AM: my_table  total count: 1926112/111546685, totalSize: 81.29773158021271 GB
--[NMIG DataWriter] 8/2/2024, 7:19:11 AM: my_table  total count: 1732649/111546685, totalSize: 81.3966732956469 GB
--[NMIG loadData] 8/2/2024, 7:19:12 AM: my_table  total count: 1927716/111546685, totalSize: 81.4275922216475 GB
--[NMIG DataWriter] 8/2/2024, 7:19:16 AM: my_table  total count: 1735124/111546685, totalSize: 81.53275842126459 GB
--[NMIG loadData] 8/2/2024, 7:19:17 AM: my_table  total count: 1934095/111546685, totalSize: 81.56192161794752 GB
--[NMIG DataWriter] 8/2/2024, 7:19:21 AM: my_table  total count: 1737575/111546685, totalSize: 81.66759182140231 GB
--[NMIG loadData] 8/2/2024, 7:19:22 AM: my_table  total count: 1955326/111546685, totalSize: 81.7029321147129 GB

Any chance, that the "my_table" contains any kind of BLOB columns?

@perrornet I had the same issue. It seems like there was a partial migration that happened and was subsequently stopped. I deleted the database and schema and re ran the script and it worked again. (I did a whole investigation into ignoring tables, increasing pool connection but nothing work).

@perrornet I had the same issue. It seems like there was a partial migration that happened and was subsequently stopped. I deleted the database and schema and re ran the script and it worked again. (I did a whole investigation into ignoring tables, increasing pool connection but nothing work).

This solution does not work for me. I have modified the nmig code and no longer use the 'copy' command to import data. Instead, I am using bulk insert SQL. The migration has now been completed successfully.

Any chance, that the "my_table" contains any kind of BLOB columns?

Yes, one of the fields in "my_table" is of type longblob.

@perrornet Could you please share your version with better logging and inserts? I faced the same issue - if the target db is not empty, then data migration stuck in ~50% of runs