pingcap/tidb

High CPU Usage due to Excessive runtime.newobject Calls in tables.(*TableCommon).AddRecord during 1 TiB Data Import in TiDB Lightning

Opened this issue · 0 comments

Enhancement

During the 1 TiB data import process using TiDB Lightning, I observed unusually high CPU utilization. After profiling with pprof and analyzing the flame graph, I found that the method tables.(*TableCommon).AddRecord at line 920, specifically the line var value types.Datum, leads to a significant number of runtime.newobject calls. These frequent memory allocations contribute to high CPU consumption, slowing down the import process.

This issue may be related to excessive object creation within the method, potentially due to repeated instantiations of types.Datum. Further investigation and optimization of memory management could help reduce the CPU load and improve performance during large data imports.

Could you please investigate if reusing existing types.Datum objects or implementing an object pool for such instances might alleviate this problem and reduce the CPU overhead?

TiDB version: v7.5.0
CPU: 64c
mem: 100GiB
Config:

[checkpoint]
driver = "file"
dsn = "/etc/endpoint/cp.pb"
enable = true

[cron]
log-progress = "5m"
switch-mode = "5m"

[lightning]
check-requirements = true
file = ""
level = "info"
pprof-port = 8289
region-concurrency = 150 
table-concurrency = 2 
index-concurrency = 2 
io-concurrency = 60

[mydumper]
strict-format = false
max-source-data-size = 10995116277760 #10TiB

[post-restore]
analyze = "optional"
checksum = true
checksum-via-sql = false

[tidb]
checksum-table-concurrency = 8
log-level = "error"
port = 4000
status-port = 10080
tls = "false"

[tikv-importer]
duplicate-resolution = "remove"
range-concurrency = 1
sorted-kv-dir = "/etc/endpoint"

Table structure

CREATE TABLE `github_events` (
`pid` bigint(20) NOT NULL AUTO_INCREMENT,
`id` bigint(20) DEFAULT NULL,
`type` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`repo_id` bigint(20) DEFAULT NULL,
`repo_name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`actor_id` bigint(20) DEFAULT NULL,
`actor_login` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`actor_location` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`language` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`additions` bigint(20) DEFAULT NULL,
`deletions` bigint(20) DEFAULT NULL,
`action` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`number` int(11) DEFAULT NULL,
`commit_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`comment_id` bigint(20) DEFAULT NULL,
`org_login` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`org_id` bigint(20) DEFAULT NULL,
`state` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`closed_at` datetime DEFAULT NULL,
`comments` int(11) DEFAULT NULL,
`pr_merged_at` datetime DEFAULT NULL,
`pr_merged` tinyint(1) DEFAULT NULL,
`pr_changed_files` int(11) DEFAULT NULL,
`pr_review_comments` int(11) DEFAULT NULL,
`pr_or_issue_id` bigint(20) DEFAULT NULL,
`event_day` date DEFAULT NULL,
`event_month` date DEFAULT NULL,
`author_association` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`event_year` int(11) DEFAULT NULL,
`push_size` int(11) DEFAULT NULL,
`push_distinct_size` int(11) DEFAULT NULL,
PRIMARY KEY (`pid`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=7398394651;

Image