sync-diff-inspector v7.1.1: Long execution time due to parsing data incorrect which COLLATE is utf8_general_ci
snowballbear opened this issue · 1 comments
Bug Report
Please answer these questions before submitting your issue. Thanks!
- What did you do?
MySQL:
CREATE TABLE items (
id varchar(255) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO test.items (id) VALUES(CONCAT("m", round(rand()*100000000000)));
TiDB :
COLLATE=utf8mb4_bin
create database test_bin;
use test_bin;
CREATE TABLE items (
id varchar(255) NOT NULL,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
COLLATE=utf8mb4_general_ci
create database test_g_ci;
use test_g_ci;
CREATE TABLE items (
id varchar(255) NOT NULL,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
export data by using dumpling and import data into tidb cluster.
Sync-diff-inspector:
- sync-config-bin.toml
# Diff Configuration.
######################### Global config #########################
export-fix-sql = true
# check data also
check-struct-only = false
######################### Datasource config #########################
[data-sources]
[data-sources.mysql1]
host = "xxx.xx.xx.xx"
port = 3306
user = "xxxx"
password = "xxxx"
route-rules = ["rule1"]
[data-sources.tidb0]
host = "xxx"
port = 4000
user = "xxxx"
password = "xxxx"
########################### Routes ###########################
# mapping rules
[routes]
[routes.rule1]
schema-pattern = "test" # source db
target-schema = "test_bin" # target db
######################### Task config #########################
[task]
output-dir = "./output1"
source-instances = ["mysql1"]
target-instance = "tidb0"
# target tables to be checked
target-check-tables = ["test_bin.items"]
- sync-config-g-ci.toml
# Diff Configuration.
######################### Global config #########################
export-fix-sql = true
# check data also
check-struct-only = false
######################### Datasource config #########################
[data-sources]
[data-sources.mysql1]
host = "xxx.xx.xx.xx"
port = 3306
user = "xxxx"
password = "xxxx"
route-rules = ["rule1"]
[data-sources.tidb0]
host = "xxx"
port = 4000
user = "xxxx"
password = "xxxx"
########################### Routes ###########################
# mapping rules
[routes]
[routes.rule1]
schema-pattern = "test" # source db
target-schema = "test_g_ci" # target db
######################### Task config #########################
[task]
output-dir = "./output2"
source-instances = ["mysql1"]
target-instance = "tidb0"
# target tables to be checked
target-check-tables = ["test_g_ci.items"]
sync_diff_inspector --config=./sync-config-bin.toml
[ec2-user@ip-172-16-1-46 sync_diff]$ cat ./output1/sync_diff.log
[2023/09/13 01:45:50.339 +00:00] [INFO] [printer.go:46] ["Welcome to sync_diff_inspector"] ["Release Version"=v7.1.0] ["Git Commit Hash"=89b804103994cd63273fa9e57e71f270b1a42fcc] ["Git Branch"=heads/refs/tags/v7.1.0] ["UTC Build Time"="2023-05-24 03:09:18"] ["Go Version"=go1.20.3]
[2023/09/13 01:45:50.350 +00:00] [INFO] [main.go:101] [config="{\"check-thread-count\":4,\"split-thread-count\":5,\"export-fix-sql\":true,\"check-struct-only\":false,\"dm-addr\":\"\",\"dm-task\":\"\",\"data-sources\":{\"mysql1\":{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null},\"tidb0\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null}},\"routes\":{\"rule1\":{\"schema-pattern\":\"test\",\"table-pattern\":\"\",\"target-schema\":\"test_bin\",\"target-table\":\"\"}},\"table-configs\":null,\"task\":{\"source-instances\":[\"mysql1\"],\"source-routes\":null,\"target-instance\":\"tidb0\",\"target-check-tables\":[\"test_bin.items\"],\"target-configs\":null,\"output-dir\":\"./output1\",\"SourceInstances\":[{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null}],\"TargetInstance\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null},\"TargetTableConfigs\":null,\"TargetCheckTables\":[{}],\"FixDir\":\"output1/fix-on-tidb0\",\"CheckpointDir\":\"output1/checkpoint\",\"HashFile\":\"\"},\"ConfigFile\":\"./sync-config-bin.toml\",\"PrintVersion\":false}"]
[2023/09/13 01:45:50.432 +00:00] [INFO] [mysql_shard.go:372] ["will increase connection configurations for DB of instance"] ["connection limit"=10]
[2023/09/13 01:45:50.432 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:45:50.434 +00:00] [INFO] [tidb.go:209] ["find router for tidb source"]
[2023/09/13 01:45:50.445 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:45:50.448 +00:00] [INFO] [diff.go:363] ["The downstream is TiDB. pick it as work source first"]
[2023/09/13 01:45:50.550 +00:00] [INFO] [diff.go:191] ["not found checkpoint file, start from beginning"]
[2023/09/13 01:45:50.556 +00:00] [INFO] [diff.go:721] ["start writeSQLs goroutine"]
[2023/09/13 01:45:50.559 +00:00] [INFO] [diff.go:377] ["start handleCheckpoint goroutine"]
[2023/09/13 01:45:50.593 +00:00] [INFO] [bucket.go:210] ["get chunk size for table"] ["chunk size"=50000] [db=test_bin] [table=items]
[2023/09/13 01:45:50.594 +00:00] [INFO] [bucket.go:103] ["close chunks channel for table"] [schema=test_bin] [table=items]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":129,\"bucket-index-right\":171,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m55408434063\",\"upper\":\"m70381241730\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":172,\"bucket-index-right\":214,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m70381241730\",\"upper\":\"m85510843141\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m85510843141\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":0,\"bucket-index-right\":42,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\",\"upper\":\"m25131780026\",\"has-lower\":false,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":43,\"bucket-index-right\":85,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m25131780026\",\"upper\":\"m40253386870\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.703 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":86,\"bucket-index-right\":128,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m40253386870\",\"upper\":\"m55408434063\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:732] ["write sql channel closed"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:723] ["close writeSQLs goroutine"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:405] ["Stop do checkpoint"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [checkpoints.go:225] ["save checkpoint"] [chunk="{\"state\":\"success\",\"chunk-range\":{\"index\":{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1},\"type\":1,\"bounds\":[{\"column\":\"id\",\"lower\":\"m85510843141\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}],\"is-first\":false,\"is-last\":false,\"where\":\"(((`id` \\u003e ?)) AND (TRUE))\",\"args\":[\"m85510843141\"]},\"index-id\":1}"] [state=success]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:379] ["close handleCheckpoint goroutine"]
[2023/09/13 01:45:50.780 +00:00] [INFO] [main.go:114] ["check data finished"] [cost=428.311227ms]
[2023/09/13 01:45:50.781 +00:00] [INFO] [main.go:108] ["check pass!!!"]
sync_diff_inspector --config=./sync-config-g-ci.toml
[ec2-user@ip-172-16-1-46 ~]$ cat ./sync_diff/output2/sync_diff.log
[2023/09/13 01:46:39.162 +00:00] [INFO] [printer.go:46] ["Welcome to sync_diff_inspector"] ["Release Version"=v7.1.0] ["Git Commit Hash"=89b804103994cd63273fa9e57e71f270b1a42fcc] ["Git Branch"=heads/refs/tags/v7.1.0] ["UTC Build Time"="2023-05-24 03:09:18"] ["Go Version"=go1.20.3]
[2023/09/13 01:46:39.162 +00:00] [INFO] [main.go:101] [config="{\"check-thread-count\":4,\"split-thread-count\":5,\"export-fix-sql\":true,\"check-struct-only\":false,\"dm-addr\":\"\",\"dm-task\":\"\",\"data-sources\":{\"mysql1\":{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null},\"tidb0\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null}},\"routes\":{\"rule1\":{\"schema-pattern\":\"test\",\"table-pattern\":\"\",\"target-schema\":\"test_g_ci\",\"target-table\":\"\"}},\"table-configs\":null,\"task\":{\"source-instances\":[\"mysql1\"],\"source-routes\":null,\"target-instance\":\"tidb0\",\"target-check-tables\":[\"test_g_ci.items\"],\"target-configs\":null,\"output-dir\":\"./output2\",\"SourceInstances\":[{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null}],\"TargetInstance\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null},\"TargetTableConfigs\":null,\"TargetCheckTables\":[{}],\"FixDir\":\"output2/fix-on-tidb0\",\"CheckpointDir\":\"output2/checkpoint\",\"HashFile\":\"\"},\"ConfigFile\":\"./sync-config-g-ci.toml\",\"PrintVersion\":false}"]
[2023/09/13 01:46:39.193 +00:00] [INFO] [mysql_shard.go:372] ["will increase connection configurations for DB of instance"] ["connection limit"=10]
[2023/09/13 01:46:39.193 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:46:39.194 +00:00] [INFO] [tidb.go:209] ["find router for tidb source"]
[2023/09/13 01:46:39.203 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:46:39.206 +00:00] [INFO] [diff.go:363] ["The downstream is TiDB. pick it as work source first"]
[2023/09/13 01:46:39.283 +00:00] [INFO] [diff.go:191] ["not found checkpoint file, start from beginning"]
[2023/09/13 01:46:39.286 +00:00] [WARN] [utils.go:335] ["Ignoring collation differences"] ["column name"=id] ["collation source"=utf8mb4_bin] ["collation target"=utf8mb4_general_ci]
[2023/09/13 01:46:39.286 +00:00] [INFO] [diff.go:721] ["start writeSQLs goroutine"]
[2023/09/13 01:46:39.286 +00:00] [INFO] [diff.go:377] ["start handleCheckpoint goroutine"]
[2023/09/13 01:46:39.295 +00:00] [INFO] [bucket.go:210] ["get chunk size for table"] ["chunk size"=50000] [db=test_g_ci] [table=items]
[2023/09/13 01:46:39.295 +00:00] [INFO] [bucket.go:103] ["close chunks channel for table"] [schema=test_g_ci] [table=items]
[2023/09/13 01:46:39.295 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":129,\"bucket-index-right\":171,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00005\\u00005\\u00003\\u00007\\u00005\\u00009\\u00006\\u00007\\u00005\\u00009\\u00009\",\"upper\":\"\\u0000M\\u00007\\u00000\\u00004\\u00002\\u00005\\u00002\\u00005\\u00004\\u00005\\u00004\\u00005\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":172,\"bucket-index-right\":214,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00007\\u00000\\u00004\\u00002\\u00005\\u00002\\u00005\\u00004\\u00005\\u00004\\u00005\",\"upper\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":0,\"bucket-index-right\":42,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\",\"upper\":\"\\u0000M\\u00002\\u00005\\u00001\\u00007\\u00005\\u00001\\u00004\\u00006\\u00002\\u00002\\u00003\",\"has-lower\":false,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":43,\"bucket-index-right\":85,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00002\\u00005\\u00001\\u00007\\u00005\\u00001\\u00004\\u00006\\u00002\\u00002\\u00003\",\"upper\":\"\\u0000M\\u00004\\u00000\\u00003\\u00006\\u00004\\u00009\\u00007\\u00001\\u00000\\u00004\\u00009\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.312 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":86,\"bucket-index-right\":128,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00004\\u00000\\u00003\\u00006\\u00004\\u00009\\u00007\\u00001\\u00000\\u00004\\u00009\",\"upper\":\"\\u0000M\\u00005\\u00005\\u00003\\u00007\\u00005\\u00009\\u00006\\u00007\\u00005\\u00009\\u00009\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:732] ["write sql channel closed"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:723] ["close writeSQLs goroutine"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:405] ["Stop do checkpoint"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [checkpoints.go:225] ["save checkpoint"] [chunk="{\"state\":\"success\",\"chunk-range\":{\"index\":{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1},\"type\":1,\"bounds\":[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}],\"is-first\":false,\"is-last\":false,\"where\":\"(((`id` \\u003e ?)) AND (TRUE))\",\"args\":[\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\"]},\"index-id\":1}"] [state=success]
[2023/09/13 01:46:39.587 +00:00] [INFO] [diff.go:379] ["close handleCheckpoint goroutine"]
[2023/09/13 01:46:39.596 +00:00] [INFO] [main.go:114] ["check data finished"] [cost=433.003467ms]
[2023/09/13 01:46:39.596 +00:00] [INFO] [main.go:108] ["check pass!!!"]
mysql> show stats_buckets where Table_name='items';
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Bucket_id | Count | Repeats | Lower_Bound | Upper_Bound | Ndv |
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
| test_bin | items | | id | 0 | 0 | 1174 | 1 | m10000063581 | m10364587056 | 0 |
| test_bin | items | | id | 0 | 1 | 2349 | 1 | m10364941331 | m10736559528 | 0 |
| test_bin | items | | id | 0 | 2 | 3524 | 1 | m10736698574 | m11081677034 | 0 |
| test_bin | items | | id | 0 | 3 | 4699 | 1 | m11081683367 | m11436310887 | 0 |
| test_bin | items | | id | 0 | 4 | 5874 | 1 | m11436381015 | m11770389706 | 0 |
....
....
| test_g_ci | items | | PRIMARY | 1 | 251 | 295994 | 1 | M 9 8 2 5 1 0 9 2 0 6 0 | M 9 8 6 2 4 6 5 5 1 4 | 0 |
| test_g_ci | items | | PRIMARY | 1 | 252 | 297169 | 1 | M 9 8 6 2 6 9 5 8 4 8 4 | M 9 8 9 7 9 9 5 9 5 4 3 | 0 |
| test_g_ci | items | | PRIMARY | 1 | 253 | 298344 | 1 | M 9 8 9 8 0 5 6 7 4 1 7 | M 9 9 3 2 4 5 7 1 2 4 8 | 0 |
| test_g_ci | items | | PRIMARY | 1 | 254 | 299518 | 1 | M 9 9 3 2 5 5 8 8 4 3 8 | M 9 9 6 7 7 5 0 3 3 8 8 | 0 |
| test_g_ci | items | | PRIMARY | 1 | 255 | 300655 | 1 | M 9 9 6 7 8 6 7 7 5 0 6 | M 9 9 9 9 9 3 4 3 3 2 4 | 0 |
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
1024 rows in set (0.02 sec)
-
What did you expect to see?
Comparison can be completed. -
What did you see instead?
The amount of data compared is 3414406630 that it hasn't been finished for more than 5.5 hours.
-
What version of TiDB are you using (
tidb-server -V
or runselect tidb_version();
on TiDB)?
v7.1.1 -
which tool are you using?
sync-diff-inspector v7.1.1 -
what versionof tool are you using (
pump -V
ortidb-lightning -V
orsyncer -V
)?
sync-diff-inspector v7.1.1
/found gs