Performance of ActiveRow::related()
MichaelPavlista opened this issue · 0 comments
During application performance analysis, I discovered that using the ActiveRow::related()
method significantly
slows down the application's execution time. The slowdown increases exponentially with the number of calls to this method.
Problem Description
The issue can be replicated with the following simple example,
which compares the same logic using ActiveRow::related()
versus manually constructed SQL.
Select batch of main and related rows using ActiveRow:
/** @var Explorer $databaseExplorer */
$databaseExplorer = $container->getByType(type: Explorer::class);
// Load batch rows from database.
$companyIdToRow = $databaseExplorer->table(table: 'company')
->select('*')
->order('id DESC')
->limit(BATCH_SIZE)
->fetchPairs('id');
/** @var ActiveRow $row */
foreach ($companyIdToRow as $row)
{
// Load related rows.
$relatedRows = $row->related('company_member')->fetchAll();
}
Select batch of main and related rows using manually constructed SQL:
// Select main rows.
$companyIdToRow = $databaseExplorer
->query('SELECT * FROM `company` ORDER BY `id` DESC LIMIT ?', BATCH_SIZE)
->fetchPairs('id');
// Create main rows ID list.
$companyIdList = array_keys($companyIdToRow);
// Select rows from the related table.
$companyIdToMemberRowList = $databaseExplorer
->query('SELECT `id`, `company_id` FROM `company_member` WHERE `company_id` IN (?)', $companyIdList)
->fetchAssoc('company_id[]->');
foreach ($companyIdToRow as $row)
{
// Load related rows.
$relatedRows = $companyIdToMemberRowList[$row['id']] ?? [];
}
Comparison of the variants
Number of rows (=BATCH_SIZE) | Execution Time [related] | Execution Time [manual SQL] | How much faster is manual SQL? |
---|---|---|---|
1 000 | 206 ms | 130 ms | 1.6x |
2 500 | 480 ms | 142 ms | 3.4x |
5 000 | 1 330 ms | 160 ms | 8.3x |
10 000 | 4 560 ms | 200 ms | 22.8x |
25 000 | 26 800 ms | 328 ms | 81.7x |
50 000 | 134 120 ms | 550 ms | 243.8x |
- The comparison was performed on PHP 8.3 without the Xdebug extension, using MariaDB 10.5.
- Both services ran on a local machine in Docker.
- In both cases, the same logic is executed, and the same two SQL queries are sent to the database. The performance issue lies within PHP.
- In the custom SQL, the columns
id
,company_id
are explicitly defined to match the query sent byActiveRow::related()
exactly. - Enabling or disabling the database's debug mode makes almost no difference.
- A dump of the test data: database-init.sql.gz.
General Notes
- In real applications, the delay caused by
ActiveRow::related()
is usually in the hundreds of milliseconds.
This is because:- Queries involving a large number of rows are rarely used.
- Common queries often target hundreds of rows, but multiple queries per page accumulate the delay.
Affected Versions
The issue is present in the latest version nette/database@3.2.2
and also exists in older versions.
Problematic Area
An analysis of the problematic area using Xdebug suggests that the majority of the script's runtime is spent repeatedly calling the SqlBuilder::getConditionHash
method.
Expected Behavior
Using ActiveRow::related()
should not significantly increase the script's runtime, at least in cases where no modifications are made to the selection before fetching.
# this code should always be fast
$relatedRows = $row->related('company_member')->fetchAll();
Docker Demo
I have prepared a demo of the issue (including a database), which can be easily run in Docker.
Setup
- Clone the repository from https://github.com/MichaelPavlista/nette-active-row.
- In the cloned folder, start the demo with the command
docker compose up
.
Test URLs:
- http://127.0.0.1/related.php (uses ActiveRow::related())
- http://127.0.0.1/related-optimised.php (uses manually constructed SQL)
You can change settings, such as the number of rows (BATCH_SIZE
), in bootstrap.local.php
.