nette/database

Performance of ActiveRow::related()

Opened this issue · 0 comments

During application performance analysis, I discovered that using the ActiveRow::related() method significantly
slows down the application's execution time. The slowdown increases exponentially with the number of calls to this method.

Problem Description

The issue can be replicated with the following simple example,
which compares the same logic using ActiveRow::related() versus manually constructed SQL.

Select batch of main and related rows using ActiveRow:

/** @var Explorer $databaseExplorer */
$databaseExplorer = $container->getByType(type: Explorer::class);

// Load batch rows from database.
$companyIdToRow = $databaseExplorer->table(table: 'company')
    ->select('*')
    ->order('id DESC')
    ->limit(BATCH_SIZE)
    ->fetchPairs('id');

/** @var ActiveRow $row */
foreach ($companyIdToRow as $row)
{
    // Load related rows.
    $relatedRows = $row->related('company_member')->fetchAll();
}

Select batch of main and related rows using manually constructed SQL:

// Select main rows.
$companyIdToRow = $databaseExplorer
    ->query('SELECT * FROM `company` ORDER BY `id` DESC LIMIT ?', BATCH_SIZE)
    ->fetchPairs('id');

// Create main rows ID list.
$companyIdList = array_keys($companyIdToRow);

// Select rows from the related table.
$companyIdToMemberRowList = $databaseExplorer
    ->query('SELECT `id`, `company_id` FROM `company_member` WHERE `company_id` IN (?)', $companyIdList)
    ->fetchAssoc('company_id[]->');

foreach ($companyIdToRow as $row)
{
    // Load related rows.
    $relatedRows = $companyIdToMemberRowList[$row['id']] ?? [];
}

Comparison of the variants

Number of rows (=BATCH_SIZE) Execution Time [related] Execution Time [manual SQL] How much faster is manual SQL?
1 000 206 ms 130 ms 1.6x
2 500 480 ms 142 ms 3.4x
5 000 1 330 ms 160 ms 8.3x
10 000 4 560 ms 200 ms 22.8x
25 000 26 800 ms 328 ms 81.7x
50 000 134 120 ms 550 ms 243.8x
  • The comparison was performed on PHP 8.3 without the Xdebug extension, using MariaDB 10.5.
  • Both services ran on a local machine in Docker.
  • In both cases, the same logic is executed, and the same two SQL queries are sent to the database. The performance issue lies within PHP.
  • In the custom SQL, the columns id, company_id are explicitly defined to match the query sent by ActiveRow::related() exactly.
  • Enabling or disabling the database's debug mode makes almost no difference.
  • A dump of the test data: database-init.sql.gz.

General Notes

  • In real applications, the delay caused by ActiveRow::related() is usually in the hundreds of milliseconds.
    This is because:
    • Queries involving a large number of rows are rarely used.
    • Common queries often target hundreds of rows, but multiple queries per page accumulate the delay.

Affected Versions

The issue is present in the latest version nette/database@3.2.2 and also exists in older versions.

Problematic Area

An analysis of the problematic area using Xdebug suggests that the majority of the script's runtime is spent repeatedly calling the SqlBuilder::getConditionHash method.

Expected Behavior

Using ActiveRow::related() should not significantly increase the script's runtime, at least in cases where no modifications are made to the selection before fetching.

# this code should always be fast
$relatedRows = $row->related('company_member')->fetchAll();

Docker Demo

I have prepared a demo of the issue (including a database), which can be easily run in Docker.

Setup

Test URLs:

You can change settings, such as the number of rows (BATCH_SIZE), in bootstrap.local.php.