Allowed memory size exhausted during dump
DanieliMi opened this issue ยท 22 comments
Preconditions
GdprDump Version: 2.2.0
PHP Version: 7.4.25
Database Version: 10.3.31-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
Steps to reproduce
- Shopware 6 database
- Config:
gdpr-dump.yml
database:
host: '%env(DATABASE_HOST)%'
port: '%env(DATABASE_PORT)%'
user: '%env(DATABASE_USER)%'
password: '%env(DATABASE_PASSWORD)%'
name: '%env(DATABASE_NAME)%'
dump:
hex_blob: true
skip_definer: true
tables:
acl_user_role:
truncate: true
cart:
truncate: true
customer:
truncate: true
customer_address:
truncate: true
customer_recovery:
truncate: true
dead_recovery:
truncate: true
enqueue:
truncate: true
google_shopping_account:
truncate: true
google_shopping_merchant_account:
truncate: true
integration:
truncate: true
log_entry:
truncate: true
message_queue_stats:
truncate: true
newsletter_recipient:
truncate: true
order:
truncate: true
order_address:
truncate: true
order_customer:
truncate: true
product_review:
truncate: true
refresh_token:
truncate: true
scheduled_task:
truncate: true
user:
truncate: true
user_recovery:
truncate: true
version:
truncate: true
version_commit:
truncate: true
version_commit_data:
truncate: true
- Create dump:
gdpr-dump gdpr-dump.yaml | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' > sql/master.sql
Expected result
A dump is created.
Actual result
PHP Fatal error
PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/local/bin/gdpr-dump/src/Database/TableDependencyResolver.php on line 2
Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in phar:///usr/local/bin/gdpr-dump/src/Database/TableDependencyResolver.php on line 2
I'll try to clone the repo and debug into the TableDependencyResolver. It seems to me there might be an cyclic dependencies.
When I remove
user:
truncate: true
I am getting the following error:
SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1 column(s)
SET autocommit=0;
When removing
order:
truncate: true
as well it works just fine. I am a bit confused in how I can figure out what the problem is with those tables.
Debugging this together with Daniel...
It is possible with
SELECT *
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME = 'user'
AND CONSTRAINT_SCHEMA = 'your_database_goes_here'
GROUP BY TABLE_NAME
to find out the forein keys.
But adding all the tables returned here (and the same for orders
) also does not help.
The problem is in this recursive function.
If we add
|| isset($resolved[$dependencyTable][$tableName])
to
the resolver seems to finish but another problem appears.
Current assumption is that it is not detecting cyclic dependencies which are like A -> B -> A but only A -> A.
Because we have
- sales_channel_domain -> sales_channel
and also
- sales_channel -> sales_channel_domain
@guvra Question: What is the purpose of the foreign key resolving? If we just do nothing in the fuction (return $resolved;
) at the beginning a dump is created, but we do not know how consistent it is.
We created text file of the foreign-key-relations for the graphviz tool and ran the nodes echod in that function through sort
and uniq
, then visualized with dot
When trying to avoid adding a dependency that has been already resolve like this:
// Detect cyclic dependencies
if ($dependencyTable === $tableName || isset($resolved[$dependencyTable][$tableName])) {
continue;
}
We're getting the following error:
Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in vendor/doctrine/dbal/lib/Doctrine/DBAL/Query/Expression/CompositeExpression.php on line 76
PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in vendor/doctrine/dbal/lib/Doctrine/DBAL/Query/Expression/CompositeExpression.php on line 76
Hi,
When gdpr-dump removes rows (truncate: true
or limit: xx
), it also removes all related rows from other tables.
Table dependencies are resolved in the class TableDependencyResolver.
The rows are then filtered out in the class TableFilterExtension.
Cyclic dependencies are detected to avoid infinite loops.
However, only cyclic dependencies on the same table are detected (table with a foreign key on itself).
The situation where two tables depend on each other is not currently detected by the dependency resolver.
So to solve this problem, you need to either update the table dependency resolver, or disable the table filter extension (there's not config for that yet, it has to be done manually in the code).
The cardinality violation error seems to be a different issue though.
@guvra Thanks for the explenation, that seems like a pretty cool feature.
We think we fixed the first issue with || isset($resolved[$dependencyTable][$tableName])
and will make a pull request.
@DanieliMi Will dig into the Cardinality Problem in CompositeExpression
. It does not seem to happen, when there are no dependencies resolved so it seems to be slightly realted at least.
I've pushed a WIP commit. I did not look into the followup issue mentioned in #57 (comment).
Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in vendor/doctrine/dbal/lib/Doctrine/DBAL/Query/Expression/CompositeExpression.php on line 76
PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in vendor/doctrine/dbal/lib/Doctrine/DBAL/Query/Expression/CompositeExpression.php on line 76
Is due to a cyclic dependency within \Smile\GdprDump\Dumper\Mysqldump\TableFilterExtension::addDependentFilter
In the case of the cycle $subQuery->getMaxResults()
(ref) return null because the sql of $subQuery
is null
.
If I change the condition to > 0
I am running into an SQL Eror:
SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1 column(s)
It seems there's a mismatch between $dependency->getForeignColumns()
and $dependency->getLocalColumns()
somewhere in your database.
You could easily check this by adding var_dumps in the addDependentFilter function (or using xdebug).
I couldn't find any mismatch but I found cases with two foreign keys between two tables like this:
table localColumns foreignTable foreignColumns
product_translation [product_id, product_version_id] product [id, version_id]
I think there are in total 3 issues here:
- The dependency cycle in which leads to the memory exhaustion. I did manage to reproduce it in the unit test and fixed it accordingly.
- Another dependency cycle. This time in
\Smile\GdprDump\Dumper\Mysqldump\TableFilterExtension::addDependentFilter
. @amenk and I believe we have a solution for this as well but it is a bit more tricky since I am not understanding the whole process here and I did not manage to write a failing test for it. - The
Cardinality violation
issue. I think this is due to the two foreign keys for two tables.\Smile\GdprDump\Dumper\Mysqldump\TableFilterExtension::getColumnsSql
is used to get the columns for the select in the sub query and to get the columns used as left expression in comparison. The method adds parentheses around the columns which is no problem in the comparison and is no problem when it's one column in the select but if you wrap two columns in parentheses you will produce theCardinality violation
error. Example:
SELECT (`id`) FROM `products`; #works
SELECT (`id`, `name`) FROM `products`; #error
@guvra I will provide a PR with solutions for all 3 cases. The first case should be clear with the test but as I said I did not manage to create a failing test (I tried to create two tables that reference each other like A -> B, B -> A) maybe you have an idea? For the third case I do not have time to create a test right now but it should be fairly straight forward. I think you know the code the best and might have an idea whether our solutions make sense or are completely off.
Just to make sure I understood your problems, could you test if the commit I just added fixes all your issues?
I looked at your commit and I'm not sure why you need to add a stopping condition in addDependentFilter
.
@guvra I can dump without any error on the branch fix-multi-column-fk and I compared a dump from your branch with #58 and they are identical:
$ diff our_master.sql master.sql
6c6
< -- Date: Fri, 05 Nov 2021 09:13:39 +0100
---
> -- Date: Fri, 05 Nov 2021 09:14:45 +0100
8858c8858
< -- Dump completed on: Fri, 05 Nov 2021 09:13:39 +0100
---
> -- Dump completed on: Fri, 05 Nov 2021 09:14:45 +0100
I wondered if changing $subQuery->getMaxResults() !== 0
to $subQuery->getMaxResults() > 0
would be enough. I tried that yesterday but got some major differences in the created dumps. Maybe I had some other changes with influences at the time I tried.
@DanieliMi OK thanks for testing, so this commit solves all the issues you had?
If that's the case I can add a few functional tests and create a new release that includes this commit.
@DanieliMi OK thanks for testing, so this commit solves all the issues you had?
Yes it does.
If that's the case I can add a few functional tests and create a new release that includes this commit.
That would be amazing.
@DanieliMi I'm reopening this, after adding more functional tests, I realized there is a regression if I replace !== 0
with > 0
in the table filter extension.
It seems that getMaxResults
returns null
when the query matches all the rows of the table.
So the if
statement must be evaluated to true
if the function returns null
.
Changing the condition to > 0
was not a real fix.
Fixed in commit 0291602 on master branch
@DanieliMi could you test this fix?
It should fix all issues without introducing any regression.
I added a lot of unit / functional tests to make sure that cyclic dependencies are properly detected, and that filters are also properly applied.
Hi @guvra I tested your commit and I did not run into any issues. The issues are solved. Thank you very much for your effort!
@DanieliMi OK thanks, I've released version 2.2.1 ๐
Hopefully it will be easier to fix this kind of error in the future, there are a lot more unit/functional tests, and the complex functions are now documented (in the PHPDoc of the function, e.g. getDependencies in TableDependencyResolver).